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S 13.  ABSTRAC  T 

The  segmentation  of  imagery  into  homogeneous  regions  using  digital  techniques 
has  been  a goal  of  researchers  for  the  past  several  years.  Pattern  recognition 
approaches  using  mathematical  models  have  achieved  results  which  are  only 
partially  satisfactory.  The  large  dimension  of  the  pattern  space  and  the  quantity 
of  data  involved  in  the  digital  representation  of  images  are  in  part  responsible  for 
the  limited  applicability  of  these  approaches.  Other  shortcomings  are  related  to 
the  demands  for  data  with  which  to  train  the  classifier. 

Approaches  based  on  linguistic  models  have  also  been  tried,  again  with 
results  which  are  partially  satisfactory.  The  most  serious  shortcomings  are 
related  to  the  performance  of  these  approaches  in“*the  presence  of  noise,  a 
phenomenon  with  which  man  has  learned  to  function  effectively. 

'This  dissertation  describes  a procedure  for  segmenting  imagery  using  digital 
techniques  and  is  based  on  the  mathematical  model.  The  classifer  does  not 
require  training  prototypes,  that  is,  it  operates  in  an  "unsupervised"  mode.  The 
procedure  is  general  in  that  the  features  most  useful  for  the  particular  image  to 
be  segmented  are  selected  by  the  algorithm.  The  algorithm  operates  without 
any  human  interaction. 

The  features  used  are  based  on  brightness  and  texture  in  regions  centered  on 
every  picture  element  in  the  image.  To  perform  an  elementary  pre- cla s sification 
of  local  regions,  a filter  based  on  the  mode  of  the  local  area  histogram  is 
proposed  and  used  in  segmenting  images. 

The  basic  procedure  is  a K-means  clustering  algorithm  which  converges  to 

a local  minimum  in  the  average  squared  inter-cluster  distance  for  a specified  — i. 
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number  of  clusters.  The  algorithm  iterates  on  the 
number  of  clusters,  evaluating  the  clustering  based 
on  a parameter  of  clustering  quality.  The  parameter 
proposed  is  a product  of  between  and  within  cluster 
scatter  measures,  which  achieves  a maximum  value 
that  is  postulated  to  represent  an  intrinsic  number  of 
clusters  in  the  data. 

It  has  been  impossible  in  the  past  to  compare 
different  segmentations  of  the  same  image.  A 
comparison  measure  based  on  the  joint  histogram 
of  the  two  segmentations  is  proposed  and  examples 
of  its  use  are  presented. 

It  is  within  the  state  of  the  art  to  adapt  the 
segmentation  procedure  described  herein  to  operate 
in  hardware  at  television  rates.  A functional  diagram 
of  such  a system  is  presented,  and  estimates  of  the 
required  capacities  are  given. 

Key  Words:  Scene  Analysis,  Pattern  Recognition, 

Image  Segmentation,  Clustering, 

Digital  Image  Processing. 
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ABSTRACT 


The  segmentation  of  imagery  into  homogeneous  regions  using 
digital  techniques  has  been  a goal  of  researchers  for  the  past  several 
years.  Pattern  recognition  approaches  using  mathematical  models 
have  achieved  results  which  are  only  partially  satisfactory.  The  large 
dimension  of  the  pattern  space  and  the  quantity  of  data  involved  in 
the  digital  representation  of  images  are  in  part  responsible  for  the 
limited  applicability  of  these  approaches.  Other  shortcomings  are 
related  to  the  demands  for  data  with  which  to  train  the  classifier. 

Approaches  based  on  linguistic  models  have  also  been  tried, 
again  with  results  which  are  partially  satisfactory.  The  most  serious 
shortcomings  are  related  to  the  performance  of  these  approaches  in 
the  presence  of  noise,  a phenomenon  with  which  man  has  learned  to 
function  effectively. 

This  dissertation  describes  a procedure  for  segmenting  imagery 
using  digital  techniques  and  is  based  on  the  mathematical  model.  The 
classifier  does  not  require  training  prototypes,  that  is,  it  operates 
in  an  "unsupervised"  mode.  The  procedure  is  general  in  that  the 
features  most  useful  for  the  particular  image  to  be  segmented  are 
selected  by  the  algorithm.  The  algorithm  operates  without  any  human 
interaction. 
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The  features  used  are  based  on  brightness  and  texture  in  regions 
centered  on  every  picture  element  in  the  image.  To  perform  an 
elementary  pre -classification  of  local  regions,  a filter  based  on  the 
mode  of  the  local  area  histogram  is  proposed  and  used  in  segment- 


ing images. 

The  basic  procedure  is  a K-means  clustering  algorithm  which 
converges  to  a local  minimum  in  the  average  squared  inter-cluster 
distance  for  a specified  number  of  clusters.  The  algorithm  iterates 
on  the  number  of  clusters,  evaluating  the  clustering  based  on  a para- 
meter of  clustering  quality.  The  parameter  proposed  is  a product 
of  between  and  within  cluster  scatter  measures,  which  achieves  a 


maximum 


value  that  is  postulated  to  represent  an  intrinsic  number 


of  clusters  in  the  data. 

It  has  been  impossible  in  the  past  to  compare  different  segmenta- 
tions of  the  same  image.  A comparison  measure  based  on  the  joint 
histogram  of  the  two  segmentations  is  proposed  and  examples  of  its 
use  are  presented. 

It  is  within  the  state  of  the  art  to  adapt  the  segmentation  proce- 
dure described  herein  to  operate  in  hardware  at  television  rates.  A 
functional  diagram  of  such  a system  is  presented,  and  estimates  of 
the  required  capacities  are  given. 
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Chapter  1 


INTRODUCTION 

This  dissertation  describes  a procedure  for  automatically  seg- 
menting images  into  regions  using  digital  techniques.  The  background 
of  this  procedure  lies  in  image  understanding  systems,  an  expansion 
of  image  processing  systems  that  attempt  to  draw  meaningful  inferen- 
ces from  visual  data.  An  important  step  to  forming  inferences  about 
the  visual  data  is  to  segment  the  image  into  regions  of  homogeneity 
to  aid  further  analysis. 

For  the  purposes  of  this  report,  a "segmented  image"  is  defined 
to  be  an  image  wherein  each  picture  element  (pixel)  in  the  image  is 
assigned  a number  corresponding  to  the  index  number  of  the  segment 
to  which  it  belongs. 

1.  1 Research  Objectives 

The  goal  of  the  research  described  herein  is  to  develop  a reason- 
ably fast  algorithm  for  segmenting  images  into  regions  that  corres- 
pond in  a large  degree  to  areas  that  would  be  perceived  as  essentially 
homogeneous  by  a human  interpreter.  The  algorithm  does  not  use 
context- related  information  such  as  shape  and  relative  position.  Eith 
er  monochromatic  or  color  imagery  may  be  segmented  utilizing  the 
same  algorithm,  with  a somewhat  expanded  feature  set  for  the  color 
imagery  to  take  advantage  of  the  multispoctral  information. 
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In  the  past,  most  evaluations  of  image  understanding  systems 
have  been  performed  by  subjective  judgement.  It  is  not  completely 
clear  how,  given  two  different  segmentations  of  the  same  image,  one 
segmentation  is  judged  better  than  the  other.  Ultimately,  the  value  of 
an  image  segmentation  system  will  lie  in  its  potential  usefulness  and 
in  its  ability  to  segment  imagery  in  a manner  that  to  some  degree 
emulates  human  perception.  Nevertheless,  it  would  be  useful  to  be 
able  to  numerically  compare  two  different  segmentations  of  the  same 
image.  To  that  end,  a comparison  measure  is  proposed  and  examples 
of  compared  segmentations  are  given. 

1.  2 Organization  of  the  Dissertation 

The  second  chapter  provides  an  overview  of  image  understanding 
systems  in  general  and  approaches  to  image  segmentation  in  the  past. 
The  approach  taken  in  this  dissertation  is  the  only  segmentation  pro- 
cedure based  entirely  on  clustering  that  has  been  reported  in  the 
literature.  While  clustering  has  been  used  to  refine  and  identify  im- 
age segmentations  in  the  past,  it  has  previously  been  believed  that  a 
pure  clustering  approach  was  too  cumbersome  computationally  to 
implement. 

The  third  chapter  consists  of  a theoretical  development  of  the 

background  of  clustering.  Additional  tools  of  statistical  data  analysis 

are  developed  to  determine  the  intrinsic  number  of  clusters  in  the 

data,  and  a novel  arrangement  of  these  tools  is  proposed  to  provide 
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a reliable  and  unambiguous  stopping  criterion  for  the  algorithm. 
Previous  work  in  each  of  the  areas  brought  to  bear  on  the  problem 
are  outlined  and  their  relationship  to  the  work  contained  herein  is 
discussed. 

Chapter  four  is  a detailed  description  of  the  approach  taken. 
Block  diagrams  and  flow  charts  of  the  algorithm  are  provided  along 
with  the  rationale  for  the  various  procedures  used.  A complete  de- 
scription of  the  feature  sets  used  to  segment  images  are  provided 
and  the  various  combinations  of  these  features  are  justified,  based 
on  results  obtained.  To  obtain  an  elementary  pre-classification  of 
region  character,  a novel  filter  based  on  the  mode  of  the  local  area 
histogram  is  proposed  and  used  to  segment  images. 

The  results  obtained  on  several  kinds  of  images  are  described 
in  detail  in  Chapter  5.  In  some  cases,  images  were  segmented  with 
more  than  one  feature  set  in  an  attempt  to  improve  performance. 
Segmentations  of  images  produced  under  various  conditions  are  com- 
pared using  a comparison  measure  developed  for  that  purpose. 

The  particular  approach  taken  to  image  segmentation  in  this 
dissertation  lends  itself  to  "real  time"  implementation,  that  is,  it  is 
possible  to  construct  electronic  hardware  to  segment  images  at  tele- 
vision rates.  Chapter  6 is  a functional  description  of  ; real  time 
implementation  of  the  algorithm  which  was  programmed  on  a general 
purpose  digital  computer.  The  procedure  described  in  Chapter  4 was 


3 


used  to  segment  two  frames  of  a motion  picture.  The  segmentations 
are  included  to  demonstrate  the  ability  of  the  algorithm  to  produce 
essentially  equivalent  segmentations  of  spatially  non-stationary  im- 
ages. 

Finally,  Chapter  7 draws  conclusions  from  the  results  and  pro- 
vides directions  in  which  further  research  is  judged  necessary. 
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Chapter  2 


IMAGE  UNDERSTANDING  SYSTEMS 
2.  1 Image  Understanding  System  Description 

An  image  unde  r standing  system  is  a system  that  uses  visual  data 
to  generate  descriptions  that  are  useful  for  desired  applications.  The 
descriptions  generated  can  be  at  very  different  levels  and  degrees  of 
detail.  If  an  image  is  represented  in  digital  form,  then  the  image  is 
represented  by  an  array  of  numbers  representing  the  brightness  at 
each  point  on  a (usually)  rectangular  grid.  These  brightness  elements 
are  called  picture  elements  (pixels).  In  the  limiting  case,  this  array 
of  numbers  "describes"  the  image. 

Image  descriptions  of  this  form  are  usually  the  starting  point  for 
image  understanding  systems.  The  system  generates  a series  of  de- 
scriptions that  are  progressively  more  general  until  a descriptive 
level  is  reached  that  satisfies  the  system  requirements.  It  has  been 
observed  [2-1"!  that  the  successive  levels  of  abstraction  require  that 
the  higher  levels  of  the  system  interact  with  the  lower  levels,  based 
on  the  current  descriptions.  This  processing  approach  is  called 
"heterarchical."  The  image  understanding  system  is  therefore  con- 
ceptualized as  having  a hierarchy  of  processing  levels,  as  shown  in 
Figure  2.1. 

The  primitive  description  level  extracts  local  features  that  are 
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Figure  2.1.  Image  Understanding  System 


not  related  to  context.  The  primary  or  "first  order"  features  of  a 
pixel  in  a monochrome  image  are  the  brightness  (with  due  considera- 
tion of  the  sensor  spectral  response)  and  spatial  location  of  the  pixel. 
All  other  features  are  of  higher  order,  that  is,  they  describe  how  the 
pixel  is  related  to  surrounding  pixels  in  the  image.  These  features 
describe  such  primitive  local  attributes  of  the  picture  as  brightness, 
texture  and  color.  A proper  primitive  description  level  of  the  image 
understanding  system  would  transform  the  features  into  a coordinate 
system  where  numerical  distance  would  be  related  to  human  percep- 
tual difference. 

A large  percentage  of  the  difficulty  with  current  schemes  for 
image  understanding  can  be  related  to  the  lack  of  understanding  of  the 
human  perceptual  system.  Preliminary  work  towards  relating  texture 
features  to  human  perception  has  been  performed  by  Thompson  [2-2!  . 
He  used  a rank  order  experiment  to  define  a combination  of  texture 
features.  This  combination  forms  a perceptual  distance  function 
which  correlates  to  some  degree  with  human  perception.  Further  work 
along  these  same  lines  on  other  features  combined  with  greater  under- 
standing of  the  human  perceptual  system  will  greatly  improve  the  op- 
eration of  image  understanding  systems. 

The  symbolic  description  level  of  the  system  takes  the  primitive 
descriptions  and  forms  more  global  and  symbolic  descriptions  of  the 
image.  Segmentation  of  the  image  takes  place  at  this  level.  The  ini- 
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tial  segmentation  is  based  purely  on  perceptual  difference.  After 
analysis  by  the  semantic  interpretation  level  of  the  system,  the  sym- 
bolic level  may  be  directed  to  merge  or  to  further  divide  regions  in 
the  image. 

The  decisions  about  dividing  the  scene  into  similar  or  homo- 
geneous regions  are  made  at  this  level  of  the  image  understanding 
system.  The  notion  of  "similar"  is  a purely  defined  concept.  Con- 
sider the  problem  of  grouping  automobiles,  busses  and  airplanes. 
These  three  items  are  different  in  obvious  respects,  and  in  certain 
circumstances,  it  would  be  legitimate  to  group  them  separately.  It 
is  also  true  that  they  are  all  vehicles  used  for  transportation,  and 
they  could  be  grouped  together.  The  transportation  specialist  might 
group  busses  and  airplanes  together  as  representing  forms  of  mass 
transit,  whereas  the  average  person  might  group  busses  and  auto- 
mobiles together  because  they  are  both  land  vehicles.  For  these  rea- 
sons, it  is  obvious  that  any  grouping  of  data  must  be  performed  with 
a specific  intent.  Feedback  from  the  semantic  interpretation  level  is 
necessary  to  ensure  that  the  symbolic  descriptions  are  consistent 
with  the  goals  of  the  image  understanding  system. 

The  semantic  interpretation  level  of  the  system  generates  hypo- 
theses for  the  contents  of  the  image  based  on  the  symbolic  descrip- 
tions. The  semantic  interpretation  level  then  further  directs  the  low- 
er processing  levels  until  the  symbolic  descriptions  confirm  one  of 
the  hypotheses.  ^ 
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2.  2 Other  Image  Understanding  System  Models 
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A number  of  somewhat  different  models  have  been  proposed  other 
than  the  model  suggested  here.  It  has  been  suggested,  for  example, 
that  a goal  directed  or  "top-down"  approach  be  used  to  look  for  a 
specific  object  in,  or  test  a specific  hypothesis  about  an  image.  Ex- 
amples of  this  are  discussed  in  [2-3l  and  are  typified  by  locating 
telephones  in  an  indoor  office  scene.  Other  examples  of  this  approach 
are  the  location  of  specific  objects  in  x-ray  radiographs  f 2 - 4~l  . 

The  problem  with  top-down  approaches  is  that  the  specific  cir- 
cumstances under  which  the  system  operates  must  be  well  defined  in 
advance.  Any  substantial  departure  from  these  circumstances  will 
cause  the  system  to  fail  to  perform  adequately. 

Other  models  represent  a middle  ground  between  the  completely 
top-down  and  the  completely  bottom-up  approaches.  These  models 
differ  mainly  in  that  they  use  knowledge  of  the  scene  at  the  earliest 
possible  stage  of  the  image  understanding  system  to  refine  the  scene 
description  as  it  is  generated  [2-5,  2 - 6 "1 . 

2.  3 Image  Segmentation  Approaches 

In  all  of  these  image  understanding  system  approaches,  gross 
overall  image  segmentation  is  necessary  to  direct  the  attention  of  the 
higher  system  levels,  form  preliminary  hypotheses  about  the  image 
(such  as  whether  it  is  an  aerial  photograph  or  indoor  scene,  etc.  ) and 

identify  areas  to  be  examined  in  greater  detail  or  merged  with  other 
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areas  of  lesser  interest.  The  image  segmentation  procedure  must  be 


sufficiently  general  that  it  will  operate  satisfactorily  over  a wide 
range  of  image  types  and  within  a wide  range  of  possible  implementa- 
tions of  image  understanding  systems.  The  segmentation  procedure 
must  be  reasonably  efficient  in  terms  of  computer  time  and  storage 
required  in  order  not  to  require  unnecessary  resources  to  implement. 
Many  previous  approaches  have  required  several  hours  of  computer 
time  to  implement.  It  has  been  observed  that  "It  is  usually  possible 
to  solve  a difficult  problem  in  a difficult  manner  by  brute  force  and 
ignorance.  However,  real  advances  are  made  by  recognizing  difficul- 
ties and  avoiding  them."  [2-7"| 

At  the  current  state  of  the  art,  it  is  fashionable  to  invoke  the 
"other  level  of  the  system"  argument  when  the  difficult  interfaces 
between  the  image  understanding  system  levels  are  encountered.  This 
argument  inevitably  insists  that  some  (usually  the  most  challenging) 
aspect  of  the  problem  is  that  which  the  higher  (lower)  level  of  the 
system  will  solve.  Not  invoking  this  argument  requires  that  the  gross 
overall  image  segmentation  be  performed  with  some  degree  of  auto- 
nomy, in  other  words,  it  must  decide  on  a segmentation  without  close 
supervision  from  the  higher  levels  of  the  system.  If  more  or  less 
detail  about  a particular  region  is  desired,  the  higher  level  of  the 
system  can  either  merge  regions  or  direct  that  regions  be  further 
segmented.  The  number  of  regions  with  which  the  higher  levels  of  the 
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system  must  deal  must  be  kept  to  a minimum,  to  permit  reasonable 
implementation  of  the  higher  system  levels. 

Segmentation  of  images  into  homogeneous  regions  has  been  a goal 
of  image  understanding  researchers  for  many  years.  Beginning  with 
simple  block-like  objects  and  the  work  of  Roberts  [2-81  , image  seg- 
mentors  evolved  into  those  attempting  to  segment  natural  scenes. 
Roberts'  work  used  intensity  to  detect  object  boundaries  for  further 
manipulation.  Later  efforts  [2-9,2-10,2-11,2-12,2-131  manipulated 
the  line  drawings  in  different  ways,  but  extracted  these  line  drawings 
as  a pre-processing  step  for  higher  level  operations. 

Extension  of  artificial  intelligence  based  procedures  to  image 
segmentation  often  used  "top-down"  approaches  based  on  a-priori 
knowledge  of  the  image  content.  Many  of  these  approaches  used  train- 
ing algorithms  to  train  the  classifier  and  highly  heuristic  features 
based  on  the  a priori  knowledge  of  the  image  and  the  purpose  of  the 
image  unde r standing  system  [2-14,2-15,2-16,2-17,2-181  . Some  of 
these  approaches  were  interactive,  that  is,  a human  operator  provi- 
ded guidance  to  the  computer  to  direct  the  segmentation  [2-19^  . An 
excellent  description  of  each  of  these  segmentation  approaches  and 
the  context  in  which  they  were  applied  is  contained  in  [2-20J. 

Common  to  all  of  these  approaches  is  the  extraction  of  line  draw- 
ings by  varying  means.  Thus  the  region  botindaries  represent  the  seg- 
mentation of  the  image.  In  some  of  these  approaches,  the  edges  are 
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sought  directly  by  edge  detection  [2-21,  2-22,  2-23,  2-24*1  or  func- 
tional approximation  [2-25,  2-26]  . In  other  approaches,  the  regions 
are  detected  first  and  the  boundaries  determined  later.  There  are  two 
general  approaches  to  region  detection.  The  first  is  a top-down  ap- 
proach wherein  the  picture  is  segmented  into  progressively  smaller 
regions  until  certain  criteria  are  satisfied.  Examples  of  these  ap- 
proaches are  found  in  [2-27,  2-28*1  . The  second  approach  is  a bottom 
up  approach  wherein  the  picture  is  divided  into  a large  number  of 
small  regions,  possibly  as  small  as  one  pixel.  These  regions  are 
successively  merged  to  form  larger  regions.  Examples  of  this  ap- 

r 

proach  are  given  in  [2-29,  2-3 

A few  attempts  at  bottom-up  approaches  to  image  segmentation 
using  clustering  have  been  made  in  the  past.  The  first  of  these  was 
performed  by  Haralick  and  Kelly  [2-31  ] . This  procedure  used  a 
modified  linking  or  "nearest  neighbor"  rule  to  form  the  clusters  on 
multi-image  data.  The  procedure  uses  two  arbitrary  thresholds  or 
parameters,  the  maximum  number  of  clusters  and  a probability 
threshold  parameter.  The  histogram  is  computed  and  peaks  are  iso- 
lated to  accelerate  the  location  of  cluster  centers.  Naturally,  the 
performance  depends  on  the  parameters  selected. 

Further  work  has  been  performed  using  textural  features  and  a 
classifier  operating  in  the  supervised  mode  [2-321  . The  supervised 

mode  requires  that  the  cluster  center  be  determined  by  'training. 
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that  is,  samples  whose  classification  is  known  are  used  to  identify 
the  cluster  centers. 

Clustering  has  also  been  applied  to  images  segmented  by  a edge 
detection  procedure  [2-33]  . The  procedure  used  was:  1)  compute  a 
gradient  image.  2)  threshold  the  gradient  image.  3)  clean  the  threshol- 
ded  gradient  image.  4)  label  connected  regions  in  (he  cleaned  image. 

5)  cluster  the  labeled,  connected  regions.  Thus,  clustering  is  used 
to  merge  and  identify  segmentations  after  they  are  formed.  A number 
of  thresholds  are  required  in  forming  and  cleaning  the  gradient  image 
and  labeling  the  connected  regions.  This  procedure  is  a combination 
of  former  types,  utilizing  both  edge  detection  and  region  detection 
to  form  the  segmentation. 

An  additional  bottom-up  approach  to  image  segmentation  is  des- 
cribed by  Ohlander  [2-34  j . This  procedure  uses  histogram  analysis 
to  successively  delete  points  contained  in  feature  histogram  peaks. 

The  feature  histograms  are  then  recomputed  and  the  process  repeat- 
ed. The  initial  system  required  considerable  human  interaction  in- 
volving peak  finding  and  selection,  selection  of  connected  regions 
and  data  base  manipulation.  Later  work  [2-20]  refined  and  accele- 
rated the  procedure  based  on  a priori  knowledge  about  feature  use- 
fulness and  sub-region  analysis. 
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Chapter  5 


I'ATTKRX  KKrOGNITION,  UNSUPEK  VISED  LEARNING 
AND  GUIS  PEKING 

A large  body  of  information  and  techniques  has  been  built  up  over 
tiie  last  several  decades  under  the  general  subject  heading  of  pattern 
recognition.  It  is  convenient  to  divide  this  body  of  knowledge  into 
two  categories.  The  first  category  consists  of  theory  and  techniques 
from  statistical  data  analysis  and  communication  theory.  The 
second  category  contains  knowledge  that  is  most  closely  related  to 
computer  artificial  intelligence. 

3.  1 Artificial  Intelligence  Approaches 

The  artificial  intelligence  approaches  often  use  language  theory  to 
describe  a scene  in  terms  of  primitive  elements  or  subpatterns  and 
their  relationship  to  each  other.  The  relationships  are  described  in 
the  syntactic  structure  models  of  formal  language  [3-11.  Visual 
patterns  are  considered  to  belong  to  a two-dimensional  language.  The 
structural  descriptions  of  these  patterns  in  terms  of  the  grammar  is 
the  syntax.  Recognition  becomes  syntax  analys is  (often  called  pars- 
ing). The  limitations  of  these  approaches  are  that  relatively  little 
work  has  been  rlone  in  noisy  syntax  and  that  most  existing  linguistic 
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schemes  are  in  terms  of  shape  which  is  but  one  of  many  features 
available  to  human  observers.  Nevertheless,  context  is  easily 
visualized  in  such  an  approach  as  additional  constraints  on  the 
relationships  between  the  primitive  elements. 

3,  2 Mathematical  Models 

The  first  results  obtained  in  the  general  discipline  now  called 

pattern  recognition  were  based  on  mathematical  models.  These 

models  [3-2"|  assume  that  a sensor  or  series  of  sensors  measure 

physical  quantities  about  an  object  in  the  real  world  as  shown  in 

Figure  3.  1.  In  general,  the  measurements  of  the  sensors  form  a 

vector  that  describes  the  object.  In  the  case  of  visual  data,  the 

sensors  are  usually  some  form  of  camera,  perhaps  extracting  multi- 

spectral  measurements  about  the  physical  world.  At  each  point 

t h 

(x,  y)  the  output  of  the  i sensor  at  time  t is 

00 

P.(x,y,t)  = [ F(x,y,t,\)V.(X)dX  (3-1) 

1 '0  1 

t h 

where  V.(X)  is  the  spectral  response  of  the  i sensor  and  F(x,  y,t,X) 
is  the  brightness  of  the  physical  world  sensed  at  point  x,  y in  the 
sensory  plane  of  the  i*^  sensor  at  time  t f 3 — 3 1 . Similar  to  the 
definition  for  one  dimensional  time  signals,  the  time  average  of  the 
image  at  (x,  y)  is 


<^P.(x,  y,  l)^>  = 


li  m 

T-*oo 


P.  (x,  y,  t)L(t)dt| 


(3-2) 
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Figure  3.1.  Classical  Mathematical  Pattern 
Recognition  Model 
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where  L(t)  is  a time  weighting  function.  The  image  at  this  point  is 
still  in  continuous  form.  For  purposes  of  manipulation  of  the  data  by 
digital  computer,  the  image  must  be  converted  into  digital  form  by 
appropriately  sampling  the  image.  Thus  the  images  are  represented 
as  real-valued  functions  of  two  spatial  variables  whose  value  at  a 
point  is  related  to  the  spectral  and  time  integrals  given  in  equations 
(3-1 ) and  (3-2). 

The  pattern  space  consists  of  the  spectral  samples  just  des- 
cribed. The  "first  order"  features  of  an  image  are  its  brightness 
(possibly  in  several  spectral  regions)  and  the  x and  y coordinates  of 
the  appropriate  point.  Each  point  is  usually  called  a pixel  (picture 
element).  Other  features,  such  as  texture,  are  properties  of  a 
region  [2-2"l  . Thus  the  feature  extraction  process  may,  in  the  case 
of  images,  enlarge  the  amount  of  data  required  to  represent  the 
image  considerably.  This  increase  causes  the  model  to  differ 
somewhat  from  the  classical  pattern  recognition  model  where  the 
feature  extraction  process  usually  performs  a data  compression  by 
representing  entire  objects  with  a single  vector  of  features. 

The  feature  space,  as  described  above,  represents  a high 

dimensional  (dimension  > 10  is  not  uncommon)  space  in  which  each 

point  in  the  image  is  represented  by  a vector  of  features  P(x,  y)  = 

(P,  (x,  y ),  P (x,  y ).  . . P (x,  y ) ).  Here  n i s the  dimension  of  the  feat  ure 
i c.  n 
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space  and  P^x.y)  is  the  value  associated  with  dimension  i at  point 
(x,  y). 

The  classification  problem  is  now  to  find  separating  surfaces 
in  n dimensions  which  will  partition  the  feature  space  into  K mutually 
exclusive  and  collectively  exhaustive  regions.  The  classification 
which  results  from  assigning  the  vectors  in  accordance  with  a 
particular  partitioning  of  the  feature  space  can  then  be  evaluated 
based  on  the  purpose  of  the  classification. 

The  model  just  described  often  assigns  the  vectors  by  discrimin- 
ant functions  whichaie  functionals  of  the  feature  vectors.  Thus 

gk(ptx,  y))  > g (P(x,  y))  for  all  k = 1,2, n (Mj)  (3-3) 

implies  that  the  feature  vector  P is  a member  of  class  W . The 

k 

discriminant  functions  usually  assume  the  form  of  distance  functions. 
The  assumption  is  normally  made  that  the  feature  space  forms  a 
metric  space.  The  metric  defined  must  satisfy  the  following  condi- 
tions with  respect  to  vectors  Pj,  P^  and  P in  the  space; 


i) 

mfPj, 

p2 ) = m(P2,P1) 

(3-4) 

ii) 

m(P j , 

P 2)  Sm(PlfP3)  + m(P2,p3) 

(3-5) 

iii ) 

m(Pj , 

PJ  20  and  m(P  , P ) = 0 iff  P.  = P„  . 
^ i c 1 Z 

(3-6) 

This  model  of  the  feature  space  when  applied  to  the  image  seg- 
mentation problem  implicitly  assumes  that  numerical  difference  is 
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directly  proportional  to  perceptual  difference  in  the  human  perceptual 
system.  This  is  an  assumption  which  is  almost  certainly  untrue  at 
the  current  state  of  knowledge  about  the  human  perceptual  system  and 
the  current  state  of  development  of  features  used  in  digital  image 
pattern  recognition  techniques.  Nevertheless,  the  existence  of  a 
(almost  certainly)  nonlinear  transformation  can  be  postulated  which 
would  map  the  feature  vectors  into  a new  space  where  the  model 
described  previously  would  be  perceptually  valid.  The  theory  and 
techniques  currently  being  applied  to  pattern  recognition  approaches 
are  equally  applicable  in  the  new  space.  It  would  be  anticipated  that 
the  results  obtained  in  this  new  space  would  more  closely  emulate  the 
human  perceptual  system. 

3 . 3 Statistical  Decision  Theory  Applications 

The  extension  of  the  model  developed  thus  far  to  statistical 
decision  theory  is  straightforward;  each  image  is  considered  to  be  a 
sample  function  of  a two  dimensional  random  process  and  each  fea- 
ture vector  is  a (vector)  random  variable.  The  classes  defined  by 
the  discriminant  functions  become  decision  regions.  Depending  on 
how  much  is  known  (or  is  assumed)  about  the  underlying  statistics  of 
the  feature  vectors,  the  many  different  forms  of  statistical  decision 
theory  can  be  applied. 

These  methods  implicitly  define  the  concept  of  "similarity.  " The 
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scaling  of  the  feature  functionals  and  the  selection  of  the  features  to  be 
used  implicitly  defines  how  the  pattern  classifier  is  to  interpret 
similarity.  This  suggests  that,  in  reality,  the  classical  pattern 
recognition  system  is  better  represented  by  Figure  3.2. 

3.  4 Supervised  Pattern  Recognition 

The  determination  of  the  discriminant  functions  in  the  traditional 
pattern  recognition  system  is  made  through  the  use  of  prototypes  or 
training  samples  whose  correct  classification  is  known.  These 
samples  are  fed  to  the  system  and  establish  the  decision  boundaries 
for  use  in  classifying  unknown  samples.  This  approach  is  often 
called  the  "supervised"  pattern  recognition  approach. 

The  selection  of  the  training  prototypes,  the  selection  of  features 
and  the  cost  weighting  of  the  feature  space  effectively  define  to  the 
classifier  what  is  intended  by  "similar.  " Since  similarity  is  highly 
context  dependent,  the  best  results  that  can  be  hoped  for  using  this 
methodology  are  to  classify  based  on  non-context  related  criteria. 

This  conceptualization  points  out  the  reason  for  some  of  the  dis- 
appointing results  in  past  efforts  which  are  based  on  classical 
pattern  recognition.  Similarity  is  a defined  concept  and  depends  on 
context.  The  mathematical  approach  does  not  readily  lend  itself  to 
the  application  of  contextual  criteria. 

3.  5 Unsupervised  Pattern  Recognition 

Frequently,  it  is  desirable  to  design  a pattern  classification 
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Figure  3.  2 


Reformulated  Pattern  Recognition  Model 


21 


system  without  the  use  of  training  samples  [3-4],  This  is  often 
called  the  "unsupervised"  approach.  There  are  a number  of  reasons 
for  using  the  unsupervised  approach. 

i)  The  number  and  characteristics  of  the  classes  may  not  be 
known  a-priori. 

ii)  Obtaining  a sufficient  number  of  prototypes  to  train  the 
classifier  is  difficult  and  time  consuming, 

iii)  Underlying  structure  in  the  data  may  be  overlooked  if 
training  prototypes  are  used. 

iv)  In  many  applications,  the  characteristics  of  Ihe  patterns 
change  slowly  with  time.  Satisfactory  performance  can 
still  be  obtained  if  the  classifier  can  track  the  changes 
using  unsupervised  techniques. 

The  theoretical  framework  on  which  unsupervised  pattern 
recognition  is  based  is  very  tenuous.  If  nothing  whatsoever  is  known 
about  the  data,  the  problem  is  not  solvable  in  general.  That  this  is 
so  is  obvious  from  the  fact  that  a nonlinear  transformation  on  the 
feature  space  could  be  defined  which  would  reorganize  the  data  in 
any  desired  form.  The  reorganized  data  would  be  equally  as  valid 
as  the  original  data  if  nothing  whatsoever  is  known  about  the  data  at 
the  outset. 

In  the  case  of  image  related  data,  it  is  known  a priori  (or  at 
least  assumed)  that  the  data  represents  low  level  perceptual  differ- 
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ences.  It  is  to  be  expected  that  regions  of  the  image  that  appear  the 
same  would  produce  feature  vectors  that  are  near  to  each  other 
whereas  regions  that  appear  substantially  different  would  produce 
feature  vectors  that  are  far  apart.  This  assumption  leads  naturally 
to  the  expectation  that  similar  appearing  regions  will  produce  groups 
of  vectors  that  are  close  together  in  feature  space.  These  groups  of 
vectors  will  hereafter  be  called  "clusters.  " 

3 . 6 Clustering 

In  general,  the  term  clustering  refers  to  the  grouping  of  a given 
set  of  objects  into  subsets  according  to  the  properties  of  each  object. 
The  subsets  are  required  to  contain  objects  that  are  in  some  sense 
more  similar  to  each  other  than  to  the  objects  in  other  subsets. 
Clustering  has  been  used  for  several  decades,  and  was  first  applied 
by  Tyron  to  numerical  taxonomy  problems  [3-5l  . 

It  has  been  previously  pointed  out  that  the  theoretical  basis  for 
unsupervised  learning  using  clustering  techniques  is  weak  at  best. 

It  has  been  observed  by  Watanabe  f3-6l  that  under  certain  conditions, 
there  is  no  theoretical  basis  at  all  for  clustering  and  unsupervised 
learning.  His  observations  eminate  from  philosophical  grounds  and 
proceed  as  follows.  Suppose  every  object  to  be  clustered  is 
described  by  n binary  descriptions.  No  loss  of  generality  is  ;ncurred 
since  any  object  described  by  a finite  number  of  finite  precision 
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numbers  can  be  described  in  this  manner. 

2n 

Thus,  in  theory,  there  are  N = 2 different  possible  descriptions 
if  the  set  of  descriptions  includes  all  complementary  descriptions. 
Hence,  every  object  is  described  by  an  equal  number  of  binary  ones 
and  zeros.  It  is  also  true  that  if  there  is  at  least  one  binary  des- 
cription that  is  different  between  two  objects,  then  it  follows  that 
there  are  exactly  N/4  binary  positions  in  which  the  objects  are 
described  identically  (a  proof  of  this  is  contained  in  f3-7*l  ).  From 
this  follows  the  somewhat  startling  conclusion  that  any  two  objects 
are  as  similar  to  one  another  as  any  other  two  objects  when  the 
degree  of  similarity  is  measured  by  the  number  of  identical  binary 
descriptions.  It  follows  that  there  is  no  such  thing  as  a class  of 
similar  objects  in  the  world. 

The  above  conclusion  does  not  coincide  with  intuition  or 
empirical  observation.  The  apparent  conflict  is  eliminated  if  it  is 
assumed  that  some  of  the  binary  descriptions  are  more  "important" 
than  others.  For  example,  binary  descriptions  that  correspond  to 
most  significant  bits  are  much  more  "important"  than  descriptions 
that  correspond  to  least  significant  bits.  As  has  been  concluded 
previously,  it  is  obviously  of  great  importance  in  the  classi fication 
process  to  define  the  concept  of  similarity  in  the  selection  and 
weighting  (importance)  of  the  descriptions  or  features. 
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There  are  any  number  of  clustering  procedures,  each  having  its 
own  peculiar  characteristics.  An  extremely  detailed  discussion  of 
numerous  different  clustering  techniques  is  contained  in  ^ 3 — 8*1  . 

There  are,  however,  certain  similarities  between  the  various 
techniques  that  permit  them  to  be  categorized.  Ball  [3-8l  defines  7 
different  cluster -seeking  techniques  for  finding  similar  subsets  in 
data.  One  of  these  he  calls  "clustering  techniques"  which  are 
distinguished  by  the  iterative  sorting  of  the  data  using  multiple 
cluster  points  until  the  cluster  means  "adequately"  describe  the  data. 

When  it  is  anticipated  that  the  clusters  are  tight  and  widely 
spaced  the  chain  method  [3-9*1  , [3-10]  may  be  used.  The  first  data 
point  is  taken  to  be  the  starting  point  of  the  first  cluster.  If  the 
distance  to  the  second  data  point  exceeds  a threshold,  the  second 
data  point  becomes  the  starting  point  of  a new  cluster.  The  distance 
from  each  succeeding  data  point  to  every  member  of  every  cluster 
is  computed,  and  the  point  is  included  in  an  existing  cluster  if  its 
minimum  distance  is  below  a threshold.  The  procedure  runs  into 
trouble  when  the  clusters  are  close  together  and  the  boundaries  are 
indistinct. 

There  are  a number  of  procedures  which  will  iterate  to  a local 
minimum  in  the  average  distance  from  each  sample  to  the  nearest 
cluster  mean.  Perhaps  the  best  example  of  these  procedures  is  the 
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nearest  means  algorithm  adapted  by  Ball  and  Hall[3-ll]  and  called 
ISODATA. 

This  procedure  begins  with  an  assumed  number  of  clusters.  The 
means  are  arbitrarily  assigned,  although  the  initial  mean  assignment 
will  affect  the  clustering  through  the  number  of  iterations  required 
for  convergence.  The  data  is  then  assigned  to  the  nearest  mean. 

After  all  of  the  data  points  have  been  assigned,  the  cluster  means  are 
recomputed  based  on  the  assigned  data  points.  This  process  continues 
until  the  data  assignment  does  not  change,  at  which  point  the  process 
is  said  to  have  converged.  This  algorithm  will  iterate  to  a local 
minimum  in  the  average  within  cluster  distance. 

Various  methods  have  been  proposed  to  use  procedures  of  the 
type  just  described  to  find  the  "correct"  or  "best"  number  of  clusters 
in  the  data.  The  algorithms  developed  by  Ball  and  Hall  [3-11]  use 
merging  and  splitting  to  arrive  at  a final  number  of  clusters.  Thus 
clusters  having  variances  that  are  larger  than  a threshold  will  be 
split  and  clusters  whose  means  are  separated  by  less  than  a thres- 
hold will  be  merged.  One  major  shortcoming  of  this  approach  is  that 
the  merging  and  splitting  thresholds  must  be  established  a priori  . 

A procedure  for  determining  these  thresholds  from  the  data  has  been 
developed  by  Fromm  and  Northouse  [3-12]. 

It  has  been  observed  by  Nagy  [3-13]  that  the  procedures  based  on 


f 


minimization  of  a distance  function,  such  as  the  procedure  just 
described,  are  most  appropriate  for  fairly  isotropic  clusters.  The 
methods  which  maximize  the  minimum  distance  between  the  members 
of  distinct  clusters  are  most  appropriate  for  dense,  clearly  separated, 
clusters  of  (perhaps)  odd  shapes. 

3.  7 Clustering  Quality  Measures 

For  clustering  procedures  of  the  nearest  means  type,  the  key 
obstacle  to  be  overcome  is  the  determination  of  the  "correct"  number 
of  clusters.  In  addition  to  the  merging  and  splitting  procedures 
mentioned  previously,  it  has  been  suggested  that  a possible  approach 
is  to  obtain  a measure  of  the  clustering  quality  represented  by  some 
parameter  alpha  [3-2"]  . This  parameter  might  be  expected  to  vary 
with  the  number  of  clusters  as  shown  in  Figure  3.  3. 

A number  of  measures  have  been  proposed  for  alpha,  one  of 
which  is  the  ratio  of  the  between  to  within  cluster  scatter  measure 
[3-141  . Thus,  if  it  is  true  that  there  are  intrinsic  clusters  in  the 
data,  the  behavior  of  alpha  would  be  as  follows.  If  the  initial  number 
of  clusters  is  less  than  the  intrinsic  number,  L,  the  within  cluster 
scatter  measure  will  be  large,  and  alpha  will  be  small.  As  the 
number  of  clusters  is  increased,  the  within  cluster  scatter  measure 
decreases  rapidly,  increasing  alpha  rapidly.  When  the  intrinsic 
number  of  cluster  (L)  is  reached,  the  rate  at  which  the  within 
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NUMBER  OF  CLUSTERS 


(b)  POOR  CLUSTERING 

Figure  3.  3 Clustering  Quality  Measure 
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cluster  scatter  measure  increases  becomes  small.  The  between 
cluster  scatter  measure  changes  very  little  after  L clusters  are 
reached,  because  the  new  cluster  centers  are  close  to  the  old 
cluster  centers.  Thus,  alpha  might  be  expected  to  behave  as  shown 
in  Figure  3.  3a. 

The  within-cluster  and  between-cluster  measures  are  derived 
from  within-cluster  and  between-cluster  scatter  matrices.  These 
measures  are  intended  to  measure  the  separability  of  the  data  [3-14”1  . 
The  within  cluster  scatter  matrix  is  based  on  the  scatter  of  the  data 
about  the  cluster  means  and  is  given  by  (3-7) 

S = y*  P(W.  )Ef(x-M.  )(x-M.  )T  jW.  1 (3-7) 

w 1 i i i i 

i 

where  W is  the  i^  cluster,  P(W.)  is  the  relative  frequency  (or 

i i 

, , th  , 

probability)  of  the  data  in  that  cluster,  and  M.  is  the  i cluster  mean. 

T 

E{>1  denotes  the  expected  value  or  average,  and  (•  ) denotes  the 
transpose  of  the  vector  quantity  in  parentheses. 

The  between  cluster  scatter  matrix  can  be  defined  in  numerous 
ways,  but  for  multi-cluster  problems,  (that  is,  problems  having 
more  than  two  clusters)  the  most  straightforward  definition  is  given  by; 

= y P(W.)(M.-M  )(M  -M  ) (3-8) 

b l i 0 i 0 

i 

Mq  is  the  overall  expected  vector  of  the  entire  mixture  and  is  given  by; 


29 


(3-9) 


M.  = E(x)  = V P(W.)M. 

0 i 1 

i 

The  goal  of  using  the  scatter  matrices  is  a measure  of  cluster 
separability.  It  is  therefore  necessary  to  derive  a number  from  these 
matrices  which  is  related  to  cluster  separability.  If  this  number  is  to 
behave  like  the  parameter  alpha  discussed  earlier,  it  should  increase 
when  the  within  cluster  scatter  decreases  or  the  between  cluster 
scatter  increases.  There  are  a number  of  ways  of  deriving  such  a 
number,  among  which  are: 


-1 

a.  = tr (S  S,  ) 
i w b 

a = X n{  | S +S,  |/|S  |] 
c w b ' w 

a,  - tr  S,  - u(tr  S -c) 

3 b w 

a = tr  S.  /tr  S 

4 b \v 


(3-10) 


where  tr(*)  indicates  "trace"  or  sum  of  the  diagonal  elements  of  a mat- 
rix, and  |*  | denotes  the  determinant  of  the  matrix.  When  is  used, 

the  procedure  is  to  maximize  Tr  S,  subiect  to  Tr  S = c.  Here  LL  is 

b J w 

the  Lagrange  multiplier  and  c is  constant. 

The  terms  and  are  invariant  under  any  non-singular  linear 

transformation.  The  terms  and  a^,  while  easier  to  compute, 

depend  on  the  coordinate  system. 

The  use  of  the  parameter  alpha  to  measure  the  "goodness"  of 

clustering  requires  that  a knee  in  the  alpha  vs.  number  of  clusters  be 
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detected  (see  Figure  3.  3).  If  the  data  is  noisy  and  the  curve  is  not 
smooth,  this  may  be  very  difficult,  A better  procedure  would  be  to 
observe  a parameter  beta  which  passes  through  a maximum  at  the 
"intrinsic"  number  of  clusters  (see  Figure  3.4). 

A candidate  for  this  measure  is 

3 = Tr  S .TrSL  (3-11) 

w b 1 

2 

When  the  number  of  clusters  equals  1 , Tr  S = 0 , the  variance  of 

w 

the  mixture,  Tr  = 0 and  3=0.  When  the  number  of  clusters 
equals  N,  where  N is  the  total  number  of  vectors  in  the  mixture, 

Tr  S =0  and  Tr  S,  = CT^ 
w b 

Hence  3 = 0. 

This  measure  is  zero  at  the  limiting  points  of  the  clustering  and 
greater  than  zero  in  the  interval.  Therefore,  it  must  attain  at  least 
one  (and  perhaps  several)  maximum  values  somewhere  in  the  interval. 
The  ideal  behavior  for  3 would  be  for  it  to  attain  a unique  maximum 
at  a clustering  of  the  data  that  would  be  regarded  as  "good"  by  a 
human  observer. 

The  use  of  Tr  S and  Tr  S,  to  define  clustering  quality  implicitly 
w b 

defines  a weighting  function  W(n. ) on  the  cluster  size.  Each  term  in 
the  within  and  between  cluster  scatter  matrices  is  composed  of  a 
weighted  sum  of  terms.  The  weighting  is  based  on  the  relative 
frequency  (probability)  of  the  data  points  in  each  cluster. 
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The  weighting  function  is  depicted  in  Figure  3.5.  Here,  n. 
is  the  number  of  points  in  the  i^  cluster  and  N is  the  total  number  of 
points. 

This  weighting  causes  large  clusters  to  have  a greater  effect 
on  the  clustering  quality  measure  than  small  clusters.  The 
probability  weighting  is  correct  if  large  clusters  are  indeed  more 
important  or  if  the  clusters  are  of  approximately  the  same  size. 

If  small  clusters  are  of  equal  importance  as  large  clusters, 
the  quality  measure  should  be  based  on  a weighting  which  gives 
equal  weight  to  every  cluster,  regardless  of  size.  This  suggests  a 
uniform  weighting  as  shown  in  Figure  3.6.  Reformulating  (3-7)  and 
(3-8)  as 

sw  = \ 53  w(ni)E&*-M.)  (x-Mi)7}  (3-12) 

i 

sb  = \ W(ni)(Mi-M0)(M.-M0)T  (3-13) 

maintains  the  property  that  the  quality  measure  is  not  directly 
affected  by  the  number  of  clusters. 

Other  situations  can  be  postulated.  If  it  is  assumed  that  the 
clusters  are  normally  distributed  in  size  about  an  average  value,  n , 
a Gaussian  weighting  is  suggested.  If  the  criterion  is  to  minimize 
the  maximum  cluster  variance,  a weighting  which  is  1 at  the 
maximum  diagonal  element  of  Sw  and  zero  elsewhere  is  correct.  All 
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of  these  weightings  attempt  to  quantify  in  different  ways  what  is 


meant  by  "good  clustering.  " 

An  interesting  relationship  is  true  when  the  probability  weighting 
function  of  Figure  3.  5 is  used. 

By  definition: 


Tr  S,  = TrY'p(W.)(M.-MJ(M.-MJ 

b i i 0 i 0 


(3-14) 


Ep(Wi)  E(nvmo/ 


(3-15) 


P(W.)m^.-2m  y^PfW.  )m.  ,+m  P(W. ) 

A—'  i ii  0)  i ii  0i  i 

j L ! i i J 


(3-16) 


Noting  that: 


^ P(W  ) = 1 


(3-17) 


y P(W.)m  . = m 
i ij  0.1 


(3-18) 


yields 


Tr  Sb  = EEpiWJmJ.  - £ 


(3-19) 


additionally,  by  definition: 


Tr  S = T r y^ P(W.)Ef(x.-m.)(x.-m.)  ] 
w * 1 l 1 i i 

= V P(W.)  Ef(x..-m..)2} 

4-*  \ 11  U 


(3-20) 


(3-21) 
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(3-22) 


P 


Therefore, 


EE  P(W.)E{x^}-  2E  P(W.) 
j i j i 


2 

m . 
ij 


Tr  Sb  + Tr  Sw  ' £ m0l 


J 1 


(3-23) 


2p(w.)2Efx|v  ■ E)p<wi> 

i j i 

Y>?„ 

j J 

(3-24) 

(3-25) 

2P(wt)  Tr  E{(x.-M0)(x.-M0 
i 

)T] 

(3-26) 

Tr[0l  = K 

(3-27) 

where  ffll  is  the  covariance  matrix  of  the  data.  Thus  Tr  S,  + Tr  S = 

b w 


constant  and 


Tr  S,  = K - Tr  S (3-28) 

b w 

Hence 

0 = TrS.TrS  = (K-Tr  S ) Tr  S (3-29)  - 

b w w w 

differentiating  with  respect  to  Tr  and  setting  the  derivative  equal 
to  zero  yields 

Tr  S = K/2  (3-30) 

w 

which  implies  that  0 achieves  a maximum  at  the  clustering  that 
causes  Tr  S to  equal  one -half  the  Tr  fjfj")  . 

Further , 
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Tr  S,  / Tr  S 
b w 


(3-31 ) 


When  Tr  S = K / 2 
w 


K - Tr  S 


Tr  S 


w 


(3-32) 


K-K/2 

K/2 


(3-33) 


Therefore,  the  ratio  of  between  to  within  cluster  scatter  measures 

will  be  exactly  1 at  the  product  maximum. 

Knowledge  of  this  relationship  is  an  advantage  for  real  time 

applications  in  that  determination  of  the  product  maximum  (0  ) 

max 

requires  that  clustering  be  performed  on  one  greater  number  of 
clusters  than  the  number  at  which  the  product  maximum  occurs  in 
order  to  detect  a decrease. 

This  relationship  does  not  hold  in  general  but  is  a phenomenon 
which  is  peculiar  to  the  weighting  of  terms  by  the  cluster  probabilities. 
3.  8 Feature  Selection 

Different  images  can  be  expected  to  be  segmented  most  efficient- 
ly by  different  sets  of  features,  depending  on  the  content  of  the  scene. 
Once  initial  clustering  has  been  performed,  it  may  be  desirable  to 
discard  those  features  not  contributing  to  good  clustering  and  re- 
cluster based  on  the  most  important  features.  In  order  to  accomplish 
this,  some  means  for  evaluating  the  usefulness  of  the  features  is 
required.  A related  problem  is  that  the  features  may  be  highly 
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correlated  in  the  original  space.  Thus  several  highly  correlated 
features  may  be  evaluated  as  good  while  conveying  essentially  the 
same  information  due  to  the  high  degree  of  correlation.  It  has  been 
concluded  by  Andrews  [3-151  that  feature  selection  in  an  uncorrelated 
space  is  highly  desirable. 

The  criterion  of  optimality  for  the  selection  of  a feature  set  is 
the  probability  of  misclassification  of  the  samples.  Several 
measures  have  been  developed  which  upper  bound  the  misclassifica- 
tion rate.  Specifically,  for  a Bayes  symmetric  cost  function 
classifier  and  Gaussian  data  the  error  rate  has  been  shown  to  be 
upper  bounded  inversely  as  the  Bhattacharyya  measure  [3-16],  [3-17]  , 
[3-18]  . 

Hence 

-B(S  S ) 

Pe  £ P(S1)P(S2)e  (3-34) 

for  a two  class  problem  where 

B(slfs2)  = i/nfiir02r1[«»1i+  r<*1r1c«2i  + zcon 

+ 5 tr{([01]+  [02l)'1(u1-u2)(u1-u2)t1  (3-35) 

The  Gaussian  distributions  are  given  by 

P(x|SK)  = N(iy  [0  1)  (3-36) 

and  [0k]  is  the  covariance  matrix  of  »v<-  V*  v?-»ss. 
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For  a multi-class  problem,  can  be  bounded  as  in  Eq.  (3-34) 
by  pairwise  averaging,  i.e., 

K J -B(S.,S.) 

P £ Y'  Y*  P(S.  )P(S.)e  1 J (3-37) 

e 4 — ' ‘-r4  i 1 

i > 1 

Equation  (3-35)  is  called  the  many-at-a-time  form  of  the 
Bhattacharyya  distance  measure.  This  equation  requires  that  the 
covariance  matrix  of  every  class  be  invertible,  a condition  which 
may  not  be  achievable  in  practice  where  the  covariance  matrices 
are  sample  determined.  A computationally  more  simple  form  of  the 
above  results  when  the  one -at-a-time  form  is  utilized.  This  form  is 
given  by: 


2 

(U, (n)  - H?(n)) 

+ | ^ ~l (3-38) 

a^n)  + o2(n) 

where  n refers  to  the  n^  dimension  of  the  space.  This  form  involves 
only  scalar  means  and  variances. 

Figure  3.7  provides  some  insight  into  the  behavior  of  the  one-at- 
a-time  Bhattacharyya  measure.  When  the  variances  are  equal  but 
the  means  are  not,  as  in  Figure  3.7a,  the  first  term  of  the 
Bhattacharyya  measure  will  be  zero  but  the  second  term  will  be  non- 


zero. 


The  second  term  will  be  large  if  the  variance  is  small  under 
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Mi  m2 

(a)  EQUAL  VARIANCES,  UNEQUAL  MEANS 


(b)  EQUAL  MEAN,  UNEQUAL  VARIANCES 


(c)  UNEQUAL  MEANS  AND  VARIANCES 


Figure  3.  7.  Bhattacharyya  Measure  Distribution 
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this  condition,  implying 

i 

by  small  variances  is  a 


[ 


ing 


between  two  classes 


that  a large  difference  in  means  accompan 
desirable  quality  in  a feature  for  distingui 
. The  situation  depicted  in  Figure  3.7b  is 


ied 

sh- 


the  reverse,  that  is,  the  means  are  equal  but  the  variance  is  not. 

If  the  variances  are  significantly  different,  the  feature  is  still 
considered  of  potential  usefulness  in  separating  the  classes.  Thus, 
in  this  situation,  the  second  term  of  the  Bhattacharyya  distance  will 
be  zero  but  the  first  term  will  be  non-zero.  Finally,  in  Figure  3.  7c 
both  the  mean  and  variance  are  unequal  and  both  terms  of  the 
measure  will  be  non-zero. 

The  performing  of  feature  evaluation  in  uncorrelated  space 
implies  that  an  eigenvector  (or  discrete  Karhunen-Loeve)  trans- 
formation is  required  [3-19].  While  the  dimensions  having  the 
largest  eigenvalues  will  be  the  best  under  certain  conditions,  they 
will  not  be  optimal  in  general.  The  one -at-a-time  Bhattacharyya 
measure  will  pick  the  correct  eigenvector  regardless  ^-15”!  . 

3.  9 Segmentation  Comparison  Measure 

There  exist  a host  of  techniques  developed  over  the  last  few 
years  for  forming  clusters  and  segmenting  images,  A common 
shortcoming  is  that  it  is  nearly  impossible  to  compare  these  methods 
sin<e  no  quantitative  conditions  of  optimality  exist.  Typically  in  the 
literature  a statistical  or  heuristic  argument  is  made  that  a proposed 


L. 
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method  should  work  well  for  a particular  type  of  scene.  A mathe- 
matical argument  sometimes  proceeds  and  classified  images  are 
displayed  to  support  the  predicted  performance.  Virtually  no  means 
exist  for  comparing  performance  among  methods  and  it  is  suspected 
that  no  one  method  works  well  for  all  types  of  scenes. 

An  approach  to  this  shortcoming  would  be  for  there  to  exist  a 
standard  data  set  of  segmented  pictures  that  human  observers  agreed 
were  "correctly"  segmented.  If  pictures  segmented  by  a candidate 
procedure  could  be  compared  to  the  standard  segmented  data  base,  a 
primitive  means  would  then  exist  for  comparing  different  segmenta- 
tions of  the  same  scene.  A comparison  measure  that  is  proposed  is; 


= H hmax(i’j)^-777 

L i.  j J 


(3-39) 


r _ * - • 

1 “ 1 

1 - — - min(I,  J) 

!•  J 

I and  J are  the  total  number  of  segments  in  images  1 and  2 respective- 
ly. h^  ^(i,  j)  are  the  elements  of  the  joint  histogram  of  the  seg- 
mented images  which  are  maximum  in  both  the  rows  and  the  columns 
of  the  joint  histogram.  Fisa  normalizing  factor  which  forces  C to  be 
bounded  as  0 sC  £ 100  and  N is  the  total  number  of  pixels  in  each 
image . 

Pictures  which  are  identical  will  have  C = 100%  while  pictures 


which  are  completely  unrelated  and  have  equal  sized  segments  will 
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have  a uniform  joint  histogram  and  have  C = 0%.  This  measure 


penalizes  the  segmented  images  for  having  non-equal  numbers  of 
segments.  Pixels  in  the  image  having  the  greater  number  of  segments 
that  are  located  in  the  smaller  sized  segments  are  regarded  as  being 
misclassified. 

As  an  example  of  the  behavior  of  this  measure,  consider  the 
segmentations  of  Figure  3.8.  The  joint  histogram  of  these  segmenta- 
tions is  also  given  in  Figure  3.  8.  The  comparison  measure  between 
these  two  segmentations  will  be: 


C = 


64  24  J_ 

100  + 100  ' 6 * 


r = . 547  x r 


r = 


100 


1 -6‘  2 


1.5 


C = 82% 

As  another  example  of  the  behavior  of  this  measure,  suppose 
that  the  two  segmentations  to  be  compared  were  similar  to  seg- 
mentation number  1 except  that  segment  number  2 is  of  different 
size.  When  segment  number  2 consists  of  only  one  pixel,  the 
comparison  measure  will  equal 47.5%.  At  the  other  extreme,  where 
segment  number  2 consists  of  all  but  1 pixel  of  the  segmentation, 
the  comparison  meas\ire  will  equal  46.0%.  The  measure  will,  in 
this  case,  reach  a maximum  of  100%  when  segment  number  2 is  of 
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equal  size  in  both  segmentations.  If  both  segmentations  are  complete- 


ly unrelated  and  have  equal  sized  segments,  the  joint  histogram  will 

be  uniform  and  each  entry  will  equal  — ~ min(I,  J)  where  I and  J are 

I*  J 

the  number  of  segments  in  pictures  1 and  2 respectively.  The 
comparison  measure  will  equal  0%  in  this  case. 
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Chapter  4 


IMAGE  SEGMENTATION  BY  CLUSTERING  - AN  APPROACH 
4.  1 Overall  Approach 

The  overall  approach  taken  to  segment  images  by  clustering  is 
depicted  in  the  general  block  diagram  of  Figure  4.  1.  The  feature 
computation  block  computes  several  features  at  each  pixel.  These 
features  are  related  to  brightness  and  texture  at  several  window 
sizes  centered  on  every  pixel. 

The  feature  decorrelation  is  performed  by  a multi -dimensional 
axis  rotation  (Karhunen-Loeve  transformation).  The  rotation  is 
performed  so  that  the  new  feature  set  is  uncorrelated. 

Feature  reduction,  which  is  accomplished  subsequently,  will 
retain  only  those  features  necessary  for  good  clustering.  If  feature 
reduction  is  not  performed  on  decorrelated  features,  several  highly 
correlated  features  may  be  retained,  but  convey  essentially  the  same 
information. 

The  feature  reduction  is  accomplished  by  performing  clustering 
on  the  full  feature  set  on  a sample  basis.  In  other  words,  only 
samples  of  the  image  are  used  for  clustering  to  reduce  the  time 


INPUT  IMAGE 


OUTPUT  SEGMENTATION 


potential  focal 
plane  implementation 

features  are  brightness, 
edges,  texture,  etc. 

makes  features 
independent 
( can  be  optional ) 

"removes  features 
not  contributing 
to  good  clustering 
(can  be  optional ) 

finds  inherent  homo- 
geneous regions  of  the 
image  automatically 


outputs  regions  of 
common  homogeneity 
to  a constant  value 


Figure  4.1  General  Block  Diagram 
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discarded.  The  optimality  criterion  will  be  discussed  in  greater 


1 


detail  subsequently. 

Clustering  is  again  performed  on  the  reduced  feature  set,  on  a 
sample  basis  as  before.  When  the  optimum  number  of  clusters  is 
determined,  the  cluster  means  are  forwarded  to  the  segmentation 
phase  of  the  algorithm.  The  segmentation  phase  assigns  every 
pixel  (vector)  in  the  image  to  the  closest  cluster  mean  received 
from  the  clustering  algorithm.  Thus,  while  the  optimum  number  of 
clusters  and  the  cluster  means  are  determined  on  a sample  basis, 
the  segmentation  is  performed  on  the  entire  image. 

The  algorithm  illustrated  in  Figure  4.  1 was  adopted  for  several 
reasons.  Clustering  on  a sample  of  the  image  is  a factor  of  1 6 
faster  than  the  same  procedure  performed  on  the  full  set  of  data. 

The  segmentation,  however,  retains  most  of  the  original  resolution 
since  it  is  performed  on  the  full  set  of  image  data. 

The  feature  decorrelation  is  necessary  in  order  that  the  feature 
reduction  will  retain  the  minimum  number  of  features  contributing 
to  good  clustering.  The  feature  reduction  improves  the  quality  of 
the  segmentation  by  discarding  noisy  and  less  useful  features.  The 

2 

first  clustering  is  performed  explicitly  for  the  purpose  of  evaluating 
the  features.  The  algorithm  iterates  to  a "correct"  number  of 
clusters,  and  the  features  are  evaluated  at  that  point.  The  second 


5 0 


clustering  is  performed  for  the  purpose  of  finding  the  means  with 
which  to  segment  the  image  in  the  segmentation  phase  of  the  algorithm. 


A detailed  flow  diagram  of  the  algorithm  is  illustrated  in  Figure 
4.2.  The  feature  computation,  which  will  be  described  in  detail 
subsequently,  produces,  as  described  previously,  a vector  at  each 
pixel  location.  These  vectors  are  forwarded  to  the  covariance 
computation  routine  and  to  the  Karhunen-Loeve  rotation. 

4.  2 Feature  Rotation 

The  covariance  matrix  is  computed  over  the  feature  set  as 


[ $(i,  j)l  = < (p.(x,  y)-p.  )(p.(x,  y)-p.)  > 

x,  y 1 J J 


(4-1) 


Here  <•  > denotes  averaging  and  p.  is  the  average  of  the  i feature 
over  the  image.  The  average  is  performed  on  every  fourth  pixel  and 
every  fourth  line  to  reduce  computation  time.  The  diagonal  elements 
of  this  matrix  are  the  feature  variances  over  the  image.  The  matrix 
which  diagonalizes  the  covariance  matrix  is  computed  yielding 


a Da  = a 


(4-2) 


where  \ is  diagonal  having  the  eigenvalues  of  the  covariance  matrix 
as  diagonal  elements  , i.e.. 


1 0 


(4-3) 
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Implies  Scalar  Data 


Figure  4.  2 Flow  Diagram  of  Image  Segmentation  Algorithm 


The  matrix  A which  accomplishes  this  diagonalization  is  the  well- 
known  matrix  of  eigenvectors,  i.e., 

A = fa,  a ...  a 1 (4-4) 

1 1 2 N 

where  a.  is  an  eigenvector,  i.e., 

0a.  = \.a.  (4-5) 

i i i 

A new  feature  set  is  computed  by  multiplying  every  vector  in 
T 

the  original  space  by  A , i.  e.  , 

— T— 

q(x,  y ) = A p(x,  y ) '4-6) 

This  transformation  corresponds  to  a multidimensional  axis 
rotation  and  is  the  discrete  form  of  the  Karhunen-Loe ve  transform- 
ation. The  covariance  matrix  in  the  rotated  space  will  be  diagonal 
and  will  be 

[0  (i,j)~!  = AT[0(i,j)1  A = A (4-7) 

K. 

This  rotated  space  of  features  is  forwarded  to  the  clustering  algorithm 
for  clustering. 

4.  3 Clustering  Algorithm 

The  clustering  algorithm  uses  the  k-means  algorithm  for 

2,3,4 16  clusters.  At  each  step,  the  quality  of  clustering  is 

computed  as  0 = Tr  S,  *Tr  S (seeCh.  3 and  equations  (3 - 3 ),  ( 3 -4 ) 

b w 

and  (3-7)).  The  average  pairwise  Bhattacharyya  distance  is 
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computed  for  every  feature.  At  the  product  maximum,  the 
Bhattacharyya  distance  for  all  features  is  computed.  Features 
having  a Bhattacharyya  distance  which  exceeds  the  overall  average 
are  identified  for  use  in  the  final  clustering.  Since  these  features 
are  uncorrelated,  only  the  minimum  necessary  are  retained  for 
good  clustering.  The  flowchart  of  the  algorithm  is  shown  in 
Figure  4.  3. 

The  vectors  are  assigned  to  the  nearest  cluster  mean  in 


accordance  with  the  distance  measure,  i.e.. 


j ,-(i)  -(2).  : v , (i)  ( 

di<p  >p  > = 'Pi  _pi 


(1)  (2), 


(4-8) 


The  clustering  of  spatial  sources  has  been  shown  to  be  very  insensi- 
tive to  the  distance  measure  used  f 4 - 1 . Therefore,  the  absolute 


value  measure  was  chosen  over  the  distance  measure 


,T'1>  T(2\ 


(1)  (2)2 

h -P.  ) 


to  reduce  the  computation  time  required. 

The  clustering  algorithm  computes  the  cluster  means  on  every 
fourth  line  and  every  fourth  pixel  to  further  reduce  computation 
time.  For  a given  number  of  clusters,  the  algorithm  iterates  until 
it  converges.  Convergence  is  assumed  to  have  been  reached  when 
the  means  on  the  K-l  — iteration  and  the  means  on  the  iteration 
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Figure  4.  3 Clustering  Algorithm  Flow  Chart 
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differ  by  less  than  one  brightness  level  in  any  dimension  for  any 


cluster.  This  is  equivalent  to 

!!C(.K1  - <1  for  all  i (4-10) 

3 3 

Since  this  is  the  limiting  resolution  of  the  data,  further  iterating  is 
performed  on  the  quantization  noise  of  the  data  and  does  not  yield 
results  which  will  significantly  affect  the  segmentation. 

The  algorithm  begins  at  2 clusters.  The  initial  means  are 
established  by  computing  the  mean  and  variance  of  each  feature  over 
the  image.  The  2 initial  cluster  centers  are  chosen  as 

CK  = CK  + <VK)X(Z(j_1)  ' (4"H) 

where  C is  the  mean  of  the  K*"*1  feature,  V is  the  variance  of  the  K*^ 
K K 

feature  and  j = 1,2  is  the  appropriate  cluster  number.  Equation  (4-11) 
places  the  initial  cluster  centers  evenly  spaced  on  the  diagonal  of 
positive  correlation  at  plus  and  minus  1 standard  deviation  in  the 
hyper-space  of  the  feature  set.  As  the  number  of  clusters  is  incre- 
mented, the  new  cluster  center  is  initialized  at  the  vector  whose 
distance  from  its  respective  cluster  center  is  the  greatest. 

Final  segmentation  is  performed  on  every  pixel,  utilizing  the 
means  or  cluster  centers  computed  during  the  clustering  algorithm. 
This  procedure  permits  segmentation  of  the  image  to  nearly  the 

original  resolution,  while  performing  the  tedious  computations  on 
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one -sixteenth  of  the  data. 


4.  4 Feature  Computation 

An  aspect  of  clustering  which  has  a major  effect  on  the  results  is 
the  feature  set  used  to  describe  the  image.  While  this  research  was 
not  intended  to  probe  deeply  into  an  optimal  feature  set,  some 
exploration  of  the  subject  was  necessary  to  permit  development  of 
the  clustering  procedure.  For  monochrome  imagery,  the  most 
obvious  features  that  are  intuitively  important  to  human  observers 
are  brightness  and  texture.  Brightness  is  a relatively  straight- 
forward concept,  but  texture  is  not.  Much  research  has  been 
performed  regarding  human  perception  of  texture,  and  the  subject 
is  far  from  closed. 

To  date,  the  most  promising  results  obtained  with  texture  oper- 
ators utilize  the  grey  level  dependancy  matrices  proposed  by 
Haralick  [4-21.  The  normal  approach  followed  with  these  measures 
is  to  compute  the  grey  level  dependency  matrices  and  then  to  derive 
texture  measures  from  the  matrices  themselves.  A large  number  of 
measures  can  be  computed  from  these  matrices,  but  Thompson  [2-21 
found  that  perhaps  5 or  less  correlate  significantly  with  human 
perception.  If  the  original  256  possible  brightness  levels  in  the 
original  picture  are  quantized  to  16  levels,  and  if  4 angles  and  4 
distances  are  used,  then  16,  16  x 16  matrices  must  be  computed  at 

every  pixel.  This  amount  of  computation  was  considered  excessive 
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in  view  of  the  goal  that  this  procedure  is  intended  to  be  reasonably 
fast. 


1 


Other  texture  measures  which  have  been  proposed  are  the  "edges 
per  unit  area"  as  a measure  of  the  local  edge  density.  This  measure 
was  computed  and  used  in  segmenting  several  types  of  scenes.  The 
basic  edge  detector  is  the  Sobel  operator  which  is  defined  as  follows: 


1 

0 

-l' 

1 

2 

1 " 

2 

0 

-2 

and  [Si  = 

0 

0 

0 

1 

0 

-1 

-1 

-2 

-1 

At  each  pixel,  the  image  is  multiplied  by  the  [S^  ] and  [S_1 
masks  yielding  Si  and  S2.  The  Sobel  magnitude  is  then  defined  as 


2 2 j 

SM  = (SI  + S2  )2 


and  the  Sobel  phase  is  given  as 


SP  = arctan 


These  measures  are  designed  to  detect  well  defined  edges.  As  a 
result,  they  tend  to  have  large  value  at  such  edges  and  very  low 
values  elsewhere.  When  quantized  to  8 bits,  much  of  the  region  is 
quantized  to  zero  and  only  clearly  defined  edges  remain  visible.  For 
that  reason,  the  logarithm  of  the  magnitude  was  taken  to  expand  the 
lower  range  of  values,  i.e.  , 


SM"  = log ( 1 4 SM) 


The  Sobel  phase  has  the  opposite  shortcoming.  It  will  have  large 
(although  nearly  random)  value  in  regions  where  no  discernible 
texture  exists.  The  Sobel  phase  texture  operator  was  therefore 
computed  as 

SP*  = (SP4tt)SM*  (4-16) 

in  an  attempt  to  suppress  the  phase  operator  when  no  texture  is 
present.  These  primative  operators  permitted  segmentation  of 
numerous  monochrome  images,  with  varying  degrees  of  success. 

The  goal  of  the  algorithm  developed  here  is  to  perform  gross 
overall  scene  segmentation.  For  this  reason,  very  small  "fine 
grain"  segments  were  considered  undesirable.  It  was  decided  to 
perform  a pre -filtering  to  make  some  basic  decisions  about  region 
character  on  a local  level  as  a first  step  prior  to  segmentation. 

Linear  operators  tend  to  blur  the  region  boundaries  and  reduce 
the  region  boundary  resolution.  The  Tukey  median  filter  [4-3"j  , 
while  much  better  in  this  respect  than  a local  averaging,  still  causes 
some  blurring  of  the  boundaries.  A filter  that  does  not  blur  the 
boundaries  was  conceived  and  called  a "mode  filter.  " This  filter 
computes  a local  area  histogram  centered  on  each  pixel,  for 
different  region  sizes,  and  outputs  the  mode  or  most  frequently 
occurring  value.  The  height  of  the  histogram  at  each  pixel  may  also 
be  used  as  a measure  of  the  local  dispersion  of  the  region.  Region 
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sizes  of  3 x 3,  7 x 7 and  15x15  were  computed.  The  local  area 

2 

histogram  was  computed  to  N levels,  where  N is  the  linear  dimen- 
sion of  the  window. 

The  effect  of  the  mode  filter  is  to  replace  every  pixel  with  the 
most  frequently  occurring  value  in  a small  re  gion  centered  on  the 
pixel.  This  removes  small  variations  in  brightness  and  tends  to 
create  relatively  large  regions  of  completely  uniform  character. 

The  disadvantage  of  the  mode  filtering  is  that  it  creates  artificial 
boundaries  when  regions  of  slowly  varying  brightness  cross  a 
threshold  of  the  histogram.  The  resulting  image  looks  much  like  a 
"paint -by  numbers"  painting. 

The  mode  filter  causes  almost  no  loss  in  boundary  resolution 

because  the  output  of  the  filter  does  not  change  until  a majority 

of  the  values  change.  Then  the  filter  output  changes  value  at  the 

point  where  the  center  of  the  window  crosses  the  region  boundary. 

The  square  window  will  clip  corners  of  square  regions,  however. 

The  dispersion  or  histogram  height  does  not  have  the  nice  properties 

of  the  mode  value.  The  height  of  the  mode  will  decrease  as  a region 

boundary  is  approached,  and  increase  as  the  region  boundary  is  left. 

Some  blurring  of  region  boundaries  will  therefore  occur. 

If  the  region  size  to  be  detected  is  near  the  size  of  the  mode 

filter  window,  it  will  pass  undetected  (or  be  severely  reduced)  by  the 

filtering  process.  The  window  size  must  therefore  be  selected  to  be 
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smaller  than  the  smallest  region  size  to  be  detected. 

Color  features  were  computed  by  performing  mode  filtering  on 
the  three  spectral  images  (red,  green  and  blue).  This  expanded 
feature  set  produced  more  subjectively  satisfying  results,  as  would 
be  expected  from  the  increased  information  available  from  the  multi- 
spectral  data. 

There  are  generally  five  feature  sets  which  are  used  for  image 
clustering.  These  feature  sets  are  summarized  in  Table  4.1.  T nese 
feature  sets  are  clearly  combinations  of  a few  different  types  of 
basic  features,  and  are  based  on  the  Sobel  operators  and  the  mode 
dispersion  for  texture  information  and  on  the  mode  filter  for  bright- 
ness information. 

The  original  set  was  feature  set  number  1.  Feature  set  number  2 
was  motivated  by  the  attempt  to  obtain  a more  satisfying  result  from 
the  aerial  image.  In  the  case  of  polychromatic  imagery,  feature  set 
number  3 is  basically  similar  to  feature  set  number  2 for  mono- 
chrome imagery,  with  the  larger  (15  x 15)  mode  filters  eliminated  to 
conserve  computer  time.  Feature  set  number  4 is  an  obvious  choice, 
and  feature  set  number  5 was  chosen  in  an  attempt  to  introduce 
texture  information  into  the  segmentation  process  for  the  ten  band 
multi -spectral  image. 

As  has  been  previously  discussed,  the  primary  goal  of  this  effort 

was  to  develop  the  segmentation  algorithm.  Feature  experimentation 
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MONOCHROME  IMAGERY 


Feature  Set  Number  1 

F eature  No. 

1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 


Description 


Original 

Log  Sobel  Magnitude 
Sobel  Phase  x Log  Sobel  Magnitude 
Feature  1 Mode  Filte  red,  3 X 3 
Feature  2 Mode  F iltered  ,3x3 
Feature  3 Mode  Filtered,3x3 
Feature  1 Mode  Filtered, 7x7 
Feature  2 Mode  Filtered, 7 x 7 
Feature  3 Mode  Filtered,7x7 
F eature  1 Mode  F ilte red,15x  15 
F eature  2 Mode  F ilte red,15x  15 
Feature  3 Mode  Filtered  15x  15 


Feature  Set  Number  2 

F eature  No. 

1 

2 

3 

4 


Description 

Original  Mode  Filteied,3x  3 
Dispersion  of  Feature  1,3x3 
Original  Mode  Filtered, 7 x 7 
Dispersion  of  Feature  3,7  x 7 


Table  4.  1.  Feature  Set  Descriptions 


POLYCHROMATIC  IMAGERY 
Feature  Set  Number  3 

F eature  No.  Description 


1 

4 

5 

10 

11 

2,  6-7,  12-13 

3,  8-9,  14-15 


Red  Original 

Red  Original  Mode  Filtered;3x3 
Dispersion  of  Feature  2,3x  3 
Red  Original  Mode  F iltered.7  x 7 
Dispersion  of  Feature  4,7  x 7 
Similar  to  above  for  Green 
Similar  to  above  for  Blue 


Feature  Set  Number  4 
F eature  No. 


Description 


1 - 10 


Ten  unmodified  bands  of  multispectral 
imagery. 


Feature  Set  Number  5 
F eature  No. 

1 - 2 

3 

4 

5 

6 


Description 

Best  tv/o  rotated  features  of  10  band 
multispectral  imagery. 

Feature  1 Mode  Filtered  .3x3 

Dispersion  of  Feature  3 ,3  x 3 

Feature  2 Mode  Filtered,  7 x 7 

Dispersion  of  Feature  5,7  x 7 


Table  4.  1 (Continued)  Feature  Set  Descriptions 
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was  done  as  required  to  investigate  the  performance  of  the  segmenta- 


tion  algorithm,  but  was  not  pursued  in  great  depth  as  a topic  having 
its  own  merit.  A great  deal  of  investigation  into  features,  especially 
texture,  obviously  remains  to  be  done. 


Chapter  5 


1 


EXPERIMENTAL  RESULTS 

An  enormous  amount  of  data  was  collected  during  the  perfor- 
mance of  numerous  experiments  in  segmenting  various  kinds  of  im- 
ages. A representative  sampling  of  that  data  is  included  in  the  photo- 
graphs, charts  and  tables  at  the  end  of  this  chapter.  The  photograph- 
ic data  consists  of  photographs  of  the  features,  both  correlated  and 
decor  related,  and  the  resulting  segmentations.  The  graphs  depict 
behavior  of  the  Bhattacharyya  distance  measure,  used  for  feature 
selection,  and  of  the  clustering  quality  measure,  used  to  stop  the 
algorithm  at  the  " correct " number  of  clusters.  The  tables  consist  of 
the  statistical  data  used  to  decorrelate  the  featuresfor  the  various 
images,  a comparison  of  the  Bhattacharyya  distance  with  the  decor- 
related  feature  eigenvalues,  results  of  running  the  segmentation  com- 
parison measure  and  a table  of  computer  time  required  to  run  the 
algorithm  in  its  various  configurations.  The  table  of  Bhattacharyya 
distance  measure  versus  feature  eigenvalues  illustrates  the  super- 
iority of  this  measure  to  the  eigenvalues  in  identifying  the  best  fea- 
tures. 

5.  1 APC  Image  Results 

Examples  of  features  and  segmented  images  are  shown  in  Fig- 
ures 5.  1 through  5.  7.  Figure  5.  1 consists  of  the  12  original  features 
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computed  from  the  APC  image.  These  features  were  subjected  to 
clustering  without  rotation,  producing  the  segmentations  of  Figure 
5.2.  The  probability  weighted  product  maximum  occurred  at  9 clus- 
ters. A graph  of  the  average  Bhattachar yya  distance  versus  the  num- 
ber of  clusters  for  this  image  is  shown  in  Figure  5.  8.  This  graph  is 
constructed  such  that  the  average  Bhattacharyya  distance  for  each 
feature  is  normalized  by  the  average  for  all  features  at  each  number 
of  clusters.  The  normalized  overall  average  therefore  consists  of  the 
horizontal  line  at  d - 1.0.  While  there  is  some  changing  of  relative 
position  between  the  features  as  the  number  of  clusters  is  varied, 
those  features  which  are  above  average  tend  to  remain  above  average, 
and  those  which  are  below  average  tend  likewise  to  remain  below 
average.  The  graph  shows  reasonably  consistent  behavior  of  the 
Bhattacharyya  distance  measure  as  the  number  of  clusters  varies. 
Thus  feature  selection  based  on  this  measure  is  a consistent  proce- 
dure. The  probability  weighted  product  of  the  between  chister  scatter 
measure  and  the  within  cluster  scatter  measure  was  computed  for 
each  number  of  clusters.  The  between  and  within  scatter  measures 
are  normalized  by  r so  that  they  range  between  0 and  1.  These  pro- 
ducts are  plotted  versus  the  number  of  clusters  for  the  APC  image 
under  various  conditions  as  well  as  for  several  other  images  in 
Figure  5.  9.  At  the  probability  weighted  product  maximum  for  the 

original  APC  features  (9  clusters  in  this  case),  the  above  average 
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features  were  7,  1,  4 and  10  in  that  order.  These  features  are  origi- 
nal mode  filtered  7x7,  original  unmodified,  original  mode  filtered 
3x3  and  original  mode  filtered  15  x 15  respectively.  Thus  all  of  the 
texture  information  has  been  discarded.  These  features  were  used  to 
again  cluster  the  image  and  the  results  are  shown  in  Figure  5.2.  The 
probability  weighted  product  maximum  occurred  at  2 clusters  for  the 
reduced  feature  set. 

The  covariance  matrix  of  the  12  original  feature  set  was  comput- 
ed and  diagonalized.  The  covariance  matrix  and  the  diagonalization 
(eigenvector)  matrix  as  well  as  the  eigenvalues  of  the  covariance  mat- 
rix are  tabulated  in  Table  5.  1.  The  eigenvectors  are  shown  as  col- 
umn vectors  in  the  table.  In  the  case  of  the  APC  image,  rotated  fea- 
ture number  1 consists  of  -.  37  x original  feature  number  1 plus  . 13  x 
original  feature  number  2,  etc.  Each  vector  of  the  rotated  feature  set 
is  computed  in  this  manner  from  the  spatially  corresponding  vector 
in  the  original  feature  set.  The  actual  rotated  feature  set  is  shown  in 
Figure  5.  3.  The  version  shown  in  Figure  5.  3 has  been  rescaled  for 
ease  of  viewing.  The  set  of  12  features  used  for  clustering  was  re- 
scaled as  a set  to  cover  the  range  0,  255.  The  feature  set  displayed 
in  Figure  5.  3 has  had  each  feature  individually  rescaled  to  cover 
0,  255  for  viewing  convenience.  The  covariance  matrix  of  the  rotated 
features  is  diagonal  and  each  diagonal  entry  is  equal  to  the  variance 

for  the  respective  feature.  The  features  are  arranged  approximately 
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in  order  of  descending  eigenvalue  (variance)  by  the  computer 


routine  that  diagonalizes  the  covariance  matrix.  The  lower  variance 
(energy)  of  the  higher  number  rotated  features  is  evident  from  their 
appearance.  The  columns  of  the  rotation  matrix  are  the  eigenvectors 
of  the  covariance  matrix  corresponding  to  the  eigenvalues  listed 
above  it.  The  average  Bhattacharyya  distance  for  each  feature  is 
listed  in  Table  5.  2,  along  with  the  eigenvalues  and  the  rank  of  the 
feature  with  respect  to  average  Bhattacharyya  distance.  It  can  be 
seen  that  the  relative  eigenvalue  does  not  exactly  correspond  to  the 
average  Bhattacharyya  distance. 

An  interesting  phenonemon  can  be  observed  in  the  behavior  of  the 
clustering  quality  measure  (product)  in  Figure  5.  9.  The  behavior  of 
the  quality  measure  for  the  rotated  and  non-rotated  feature  sets  is 
almost  identical,  which  is  to  be  expected  if  the  intrinsic  structure  of 
the  data  is  unchanged  by  the  feature  rotation  process.  The  clustering 
quality  measure  maximum  is  rather  broad  in  both  cases  for  the  full 
feature  sets.  The  reduced  feature  sets,  on  the  other  hand,  show  a 
sharper,  more  clearly  defined  peak  in  the  quality  measure,  suggest- 
ing that  the  intrinsic  clusters  in  the  data  are  more  clearly  defined  in 
the  reduced  sets  of  features.  For  all  images  tested,  the  quality  mea- 
sure tended  to  demonstrate  a more  clearly  defined  maximum  when 
computed  on  feature  sets  that  were  expected  to  yield  ’’better"  cluster- 
ing. 


The  segmentation  at  the  probability  weighted  product  maximum 
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for  the  12  non-rotated  features  (Figure  5.  2)  and  the  segmentation 


at  the  probability  weighted  product  maximum  for  the  12  rotated  fea- 
tures (Figure  5.4)  appear  very  similar.  That  this  is  so  is  expected, 
since  the  multidimensional  rotation  of  the  axes  by  the  rotation  matrix 
is  a linear  invertible  map  and  should  not  change  the  shape  of  the  clus- 
ters. The  differences  which  do  exist  are  due  in  small  part  to  numeri- 
cal (round  off)  errors  in  the  computation  and  it  is  conjectured  that 
they  are  due  in  larger  part  to  the  fact  that  the  clustering  algorithm  is 
somewhat  sensitive  to  cluster  initialization.  The  nearest  means  al- 
gorithm will  converge  to  a local  minimum  in  the  average  inte r -cluste r 
distance.  In  addition,  since  convergence  of  the  algorithm  for  a fixed 
number  of  clusters  is  considered  to  occur  when  the  means  change  less 
than  one  brightness  value  in  any  dimension,  the  final  clustering  is 
also  slightly  sensitive  to  the  direction  from  which  the  convergence  is 
approached.  Nevertheless,  the  agreement  is  surprisingly  good,  and 
supports  the  hypothesis  that  intrinsic  clusters  do  in  fact  exist  in  the 
data. 

The  average  Bhattachar yya  distances  for  the  rotated  features 
were  computed  for  the  rotated  features  and  are  plotted  in  Figure  5.  8. 
The  above  average  features  at  the  probability  weighted  product  maxi- 
mum (4  in  this  case)  were  used  to  again  cluster  the  image.  The  re- 
sults of  this  are  shown  in  Figure  5.4.  The  comparison  between  these 
segmentations  and  the  segmentations  performed  with  the  4 best  non- 
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rotated  features  is  interesting.  The  feature  reduction  in  the  non- 


rotated  space  retained  all  of  the  brightness  related  features  and  none 

I 

of  the  texture  related  features.  Therefore,  the  4 retained  features 
were  highly  correlated  and  all  of  the  texture  information  was  lost. 

The  feature  reduction  in  the  rotated  space,  on  the  other  hand,  dis- 
carded only  non-cor related  information.  The  features  that  remained 
can  be  expected  to  contain  most  of  the  information  necessary  for 
clustering.  The  segmentation  that  resulted  from  the  4 best  rotated 
features  differs  from  the  segmentation  of  the  4 non-rotated  features 
mainly  in  that  the  background  has  been  split  into  two  segments.  Other- 
wise, the  segmentations  are  very  similar  and  suggest  fhat  the  intrinsic 
structure  of  the  data  has  been  retained. 

The  best  of  the  rotated  features  was  substantially  higher  in  Bhat- 
tacharyya  distance  than  any  of  the  other  features.  This  is  to  some 
extent  expected,  since  the  rotation  process  will  compact  the  maxi- 
mum amount  of  information  into  the  features  having  the  largest  eigen- 
values. Accordingly,  it  was  decided  to  perform  clustering  on  this  one 
exceptionally  good  feature.  The  results  of  this  are  also  shown  in 
Figure  5.4.  The  classification  of  the  bushes  in  the  images  as  being 
the  same  as  the  vehicle  constitutes  an  error  or  misclassification  in 
the  process.  Substantially  more  errors  were  made  when  the  segmen- 
tation was  done  with  only  one  feature,  which  is  expected  due  to  the 


large  reduction  in  dimension. 
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5.  2 Aerial  Image  Results 


The  segmentation  procedure  was  applied  to  an  aerial  image.  The 
original  as  well  as  the  segmentation  results  are  shown  in  Figure  5.  5. 
The  photograph  labeled  "5  best  of  12  rotated  features"  was  segmented 
using  the  same  12  features  used  for  the  APC  image.  This  image  was 
also  segmented  using  a different  feature  set  consisting  o 4 features. 
To  attempt  to  measure  the  local  dispersion  of  grey  level  values,  the 
height  of  the  local  histogram  mode  as  well  as  the  grey  scale  value  of 
the  mode  was  used  as  a feature.  The  height  was  rescaled  0,  255  be- 
fore clustering.  An  example  of  the  dispersion  feature  for  a 3 x 3 
window  is  shown  in  Figure  5.  5.  The  set  of  four  features  consisted  of 
the  mode  and  dispersion  for  a 3 x 3 window  and  the  mode  and  disper- 
sion for  a 7 x 7 window.  These  features  were  rotated;  clustering  was 
performed;  the  above  average  features  (1  in  this  case)  were  selected 
and  clustering  was  again  performed.  The  results  of  this  segmenta- 
tion are  labeled  "1  Best  of  4 Rotated  Features"  in  Figure  5.  5.  Co- 
variance  and  rotation  matrices,  eigenvalues  and  Bhattacharyya  dis- 
tances, and  probability  weighted  product  behavior  for  this  image  are 
tabulated  in  Table  5.  1.  5.  2 and  Figure  5.  9 respectively. 

5.  3 House  Multispectral  Images  Results 

The  segmentation  procedure  was  also  applied  to  polychromatic 
imagery.  It  would  be  expected  that  somewhat  improved  results  would 

be  obtained  from  the  expanded  feature  set,  and  the  results  seem  to 
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confirm  this  expectation.  The  first  polychromatic  image  to  which  the 


procedure  was  applied  was  a color  image  of  a house.  The  red,  green 
and  blue  original  images  for  this  picture  are  shown  in  Figure  5.  6. 

The  feature  set  consisted  of  the  mode  and  dispersion  for  each  of  the 
three  color  planes,  in  both  3x3  and  7x7  windows.  Thus  there  were 
4 features  plus  the  original  for  each  of  the  three  colors,  yielding  15 
features.  Examples  of  the  dispersion  feature  for  the  red  image  are 
shown  in  Figure  5.6.  The  results  of  the  segmentation  for  the  full  fea- 
ture set  and  the  best  (1  in  this  case)  feature  are  shown  in  Figure  5.  6 
and  are  remarkably  similar.  The  windows  were  classified  as  "sky" 
because  the  sky  is  reflected  in  them.  The  covariance  and  rotation 
matrices,  eigenvalues  and  Bha ttacha r yya  distances  and  probability 
weighted  product  behavior  for  this  image  are  tabulated  in  Tables  5.  1, 
5.  2 and  Figure  5.  9 respectively. 

A series  of  10  multi- spectral  images  were  also  used  for  segmenta- 
tion. Two  bands  of  the  original  set  and  the  resulting  segmentations 
are  shown  in  Figure  5.  7.  The  covariance  matrix  of  the  rotated  feature 
set  was  approximately  singular,  indicating  that  the  multispectral  data 
is  highly  redundant.  The  variance  of  the  higher  numbered  features  was 
low  and  approximately  the  same  value  as  the  off-diagonal  elements  in 


had  very  little  energy.  As  a result,  the  best  two  features  selected 
by  the  algorithm  included  a feature  which  had  extremely  low  variance 
and  the  resulting  segmentation  looked  almost  identical  to  the  segmen- 
tation produced  by  the  full  10  feature  set.  In  an  attempt  to  improve 
performance,  the  best  two  features  having  significant  energy  (varian- 
ce) were  used  to  produce  the  segmentation  labeled  "2  Rotated  Fea- 
tures -Augmented."  The  augmentation  consisted  of  the  mode  and 
dispersion  computed  on  each  of  the  "hand  selected"  features.  The 
covariance  and  rotation  matrices,  eigenvalues  and  Bhattacharyya 
distances,  and  probability  weighted  product  behavior  are  tabulated  in 
Tables  5.  1,  5.  2 and  Figure  5.  9 respectively. 

5.  4 Computer  Time  Required 

The  computer  time,  in  CPU  minutes,  for  various  steps  in  this 
procedure  is  tabulated  in  Table  5.  3.  The  "full  runs"  are  permitted 
to  run  out  to  16  clusters,  regardless  of  the  occurrence  of  a product 
maximum.  The  "abbreviated  run"  consists  of  stopping  the  clustering 
algorithm  when  the  product  maximum  is  realized.  This  occurs  when 
clustering  is  performed  with  one  more  cluster  than  the  number  cor- 
responding to  the  product  maximum.  The  segmentation  produced, 
however,  corresponds  to  the  previous  clustering,  that  is,  the  product 
maximum. 

The  computation  was  performed  on  a PDP-10  computer  utilizing 

a BBN  Tenex  operating  system.  The  programs  were  written  in  FOR- 
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TRAN.  The  clustering  is  performed  on  every  fourth  pixel  and  every 
fourth  line,  while  the  segmentation  is  performed  on  every  line  and 
every  pixel.  Only  one  line  is  stored  in  core  memory  at  a time,  to 
reduce  core  requirements.  The  effect  of  one -line -a  t-a  - time  cluster- 
ing also  speeds  up  the  procedure  considerably,  since  the  machine  is 
time-shared,  and  the  computer  core  image  must  be  stored  on  disk 
each  time  the  allocated  time  for  the  particular  user  expires.  Exces- 
sive core  storage  requires  excessive  switch  in/switch  out  time,  and 
substantially  increases  the  CPU  time  required  for  program  execution. 

Several  steps  might  be  taken  to  reduce  the  computer  time  required 
for  performance  of  the  algorithm.  The  mode  filters  might  be  imple- 
mented more  efficiently  by  modifying  the  local  histogram  as  the  win- 
dow is  moved  instead  of  recomputing  it  at  every  pixel.  If  the  region 
size  desired  in  the  segmentation  is  known  a-priori,  the  mode  filter- 
ing might  be  performed  on  only  one  window  size,  instead  of  several 
window  sizes,  as  is  currently  done. 

It  might  be  that  a suitable  fixed  transformation  of  the  features 
would  suffice  for  a class  of  pictures.  If  this  were  so,  the  transforma- 
tion could  be  computed  once,  and  used  for  every  picture  to  be  seg- 
mented. Alternately,  a preliminary  feature  rejection  based  on  eigen- 
value could  be  performed,  eliminating  some  of  the  features  prior  to 
the  first  clustering  operation. 

Since  feature  computation  comprises  roughly  one-third  (see 
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Table  5.  3)  of  the  computer  time  required,  a smaller  number  of  easy- 
to-compute  features  would  clearly  speed  up  the  process. 

Improved  computational  efficiency  could  also  be  obtained  by 
writing  some  of  the  repetitive  calculations  as  machine -language  sub- 
routines, instead  of  in  FORTRAN,  as  the  algorithm  is  currently 
implemented. 

5.  5 Comparison  Measure  Results 

The  results  of  comparing  numerous  segmentations  of  the  APC 
image  using  the  comparison  measure  are  tabulated  in  Table  5.  4.  The 
highest  number  in  the  table  is  96%  and  occurs  in  the  comparison  of 
the  2 cluster  segmentation  of  the  4 best  non-rotated  APC  feature  set 
with  the  2 cluster  segmentation  of  the  1 best  rotated  APC  feature  set. 
Both  of  these  segmentations  were  the  product  maximum  for  the  re- 
spective feature  sets.  It  can  be  seen  from  figures  5.  2 and  5.  4 that 
these  segmentations  appear  almost  identical,  which  is  intuitively 
satisfying  since  the  comparison  measure  should  correlate  with  hu- 
man perception.  It  should  also  be  noted  that  the  two  segmentations 
are  negatives  of  each  other,  that  is,  the  numerical  values  assigned 
to  "APC"  and  "background"  are  opposite  in  the  respective  segmenta- 
tions. The  comparison  measure  effectively  ignores  the  absolute  value 
of  the  points  by  selecting  the  elements  in  the  joint  histogram  which  are 
larger. 

The  ideal  use  for  the  quality  measure  would  be  to  compare  seg- 
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mentations  made  by  different  procedures  to  a "standard''  segmenta- 
tion, perhaps  created  by  a human  observer.  If  a "standard"  set  of 
segmentations  existed,  different  procedures  for  segmenting  images 
could  be  compared  to  the  standard  segmentations,  and  a numerical 
indication  of  effectiveness  could  be  derived. 

Ultimately,  the  effectiveness  of  a segmentation  procedure  will 
depend  on  the  purpose  for  which  the  segmentation  is  performed.  If 
the  subdivision  of  one  segment  recognized  by  a human  observer  into 
several  segments  is  of  no  consequence,  then  those  elements  of  the 
joint  histogram  corresponding  to  the  subsegments  could  be  combined 
and  tne  comparison  measure  would  be  higher.  In  any  event,  informed 
use  of  the  comparison  measure  should  permit  numerical  measure  of 
segmentation  performance,  an  ability  long  lacking  in  image  under- 
standing system  research. 
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Figure  5.  1 (continued)  APC  Image  Original  Features 
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FIGURE  2.  12  NON  REDUCED  CORRELATED  FEATURES 
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FIGURE  3. 
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Figure  5.3  APC  Image  Rotated  Features 
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Rotated  Feature  7 Rotated  Feature  8 


Figure  5.3  (continued)  APC  Image  Rotated  Features 
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Figure  5.4  Segmentations  - 12  APC  Image  Rotated  Features 
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Figure  5.4  (continued)  Segmentations  - 1 Best  APC 

Rotated  Feature 
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Figure  5.6  (continued)  House  Image  Results 
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Figure  5.7  Multi  spectral  Images  Result 
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Figure  5.8  (Continued)  Average  Bhattacharyya  Distance  vs.  Number  of  Clusters 
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Figure  5.9  Probability  Weighted  Product  vs.  Number  of  Clusters 
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Figure  5.9  (Continued)  Probability  Weighted  Product  vs.  Number  of  Clusters 


Figure  5.9  (Continued)  Probability  Weighted  Product  vs.  Number  of  Cluster 
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Figure  5.  9 (Continued)  Probability  Weighted  Product  vs.  Number  of  Clusters 
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TABLE  5.1  (Continued) 


* 


o 

rH 

ID 

CN 

00 

in 

00 

ID 

o 

o 

00 

in 

rH 

ro 

m 

r- 

r- 

VD 

ID 

d 

ro 

o 

o 

CN 

OS 

O 

CN 

o 

rH 

r- 

rH 

ro 

CN 

in 

• 

• 

• 

• 

1 

rH 

I 

rH 

rH 

o 

o 

o 

o 

O 

O 

O 

o 

I I I I I 


ro  o 
o I 


o 

CN 

os 

r^> 

rH 

in 

CN 

O rH 

vd  ro 

ro 

in 

o 

00 

00 

ro 

o 

H1  O 

in  rH 

ro 

O 

uo 

o 

VD 

CN 

CN 

00 

CN 

rH 

• • 

• 

• 

1 

o o 

o o 

o 

O 

o 

o 

I l 


hVD^C^HOhCO 

^rro«Do>cr>r^r^^ 
-— i rH  i (\  n 

I <N  rH 


ID 

00 


ro  id  oo  d ro  cr.  os 
HfOfNhHHMH 


oooooooo 

I I I 


o 

o 

in 

VD 

UO 

rH 

o> 

rH 

ro 

CN 

CO 

<J\ 

ID 

r- 

O 

uo 

o> 

o 

CN 

0\ 

ID 

O 

uo 

CN 

rH 

CN 

rH 

CN 

ID 

O 

rH 

CN 

l 

CN 

rH 

1 

CN 

• 

• 

• 

• 

• 

• 

• 

• 

1 

1 

1 

rH 

o 

1 

O 

o 

o 

o 

O 

1 

O 

o 

CN 

in 

rr 

ID 

0\ 

CN 

CN 

CN 

g 

O 

OS 

o 

CN 

rH 

ro 

00 

CN 

H- 

00 

CN 

o 

Os 

CO 

ID 

ro 

H 

CN 

rH 

o 

O 

o 

00 

rH 

ro 

CN 

ro 

CN 

CN 

Os 

00 

• 

• 

• 

• 

• 

• 

1 

CN 

l 

CN 

| 

1 

rH 

rH 

U 

O 

o 

o 

o 

o 

o 

o 

o 

u 

1 

1 

1 

o 

UO 

o 

in 

VD 

o 

D 

ro 

ro 

CN 

r- 

o 

o 

ID 

uo 

rH 

r- 

r- 

00 

os 

ID 

rH 

uo 

D 

o 

o 

rH 

CN 

o 

i 

XT 

1 

rH 

VD 

rH 

VD 

Id 

• 

• 

• 

• 

• 

• 

rH 

rH 

1 

1 

CN 

o 

o 

O 

o 

o 

o 

O 

o 

I I I 


o 

u 


os 

ro 

uo 

uo 

O 

VD 

(J>  rH 

CN 

o 

■^r 

ro 

CN 

in 

CN 

*3* 

ro 

ro 

uo 

KT 

U1 

ro 

1 

o a 

) CN 

uo 

ro 

CN 

o 

’'T 

CN 

uo 

rH 

CN 

1 

CN 

CN 

w 

o a 

• 

• 

• 

• 

• 

• 

• 

1 

rr 

CN 

1 

rH 

y 

r>  C 

5 d 

o 

O 

O 

o 

o 

o 

o 

in(^o(No>r^rnoN 

^nHoj^^Oh 

•H  | rH  | | | | | 


in 

o 

ro 

oo 


VDMD^ncOHH 
Oino^fOfN  OVD 


OOOOOOOO 

I I I 


104 


APC  Image 


Rotated 
F eature  No. 

Eigenvalue 

Normalized  Average 
Bhattacharyya  Distance 

Rank 

1 

849.  3 

4.  39 

1 

2 

377.  3 

0.  61 

6 

3 

279.  3 

1.  46 

3 

4 

149.  3 

1.  05 

4 

5 

81.  8 

0.  17 

1 1 

6 

78.  6 

0.  72 

5 

7 

46.  8 

0.  17 

1 1 

8 

26.  4 

1.  72 

2 

9 

3.  2 

0.  34 

9 

10 

11.  1 

0.  57 

7 

11 

13.  6 

0.  53 

8 

12 

6.  6 

0.  27 

10 

Overall  Average  = . 36 
(Normalizing  factor) 


Aerial  Image 


Rotated 
Feature  No. 

Eigenvalue 

Normalized  Average 
Bhattacharyya  Distance 

Rank 

1 

550.  6 

3.  02 

1 

2 

308.  6 

2.  51 

2 

3 

209.  7 

1.  11 

5 

4 

161.7 

1.  28 

4 

5 

1 15.  1 

1.  80 

3 

6 

62.  9 

0.  30 

8 

7 

46.  7 

0.  19 

1 1 

8 

25.  2 

0.  29 

9 

9 

7.  2 

0.  17 

12 

10 

8.9 

0.  23 

10 

11 

10.  8 

0.  77 

6 

12 

3.  9 

0.  32 

7 

Overall  Average  = 0.  19 
(Normalizing  factor) 


Table  5.  2 Eigenvalues  vs.  Bhattacharyya  Distances 
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House  Image 


Rotated 
Feature  No. 

Eigenvalue 

Normalized  Average 
Bhattacharyya  Distance 

Rank 

1 

1559. 2 

12.  35 

1 

2 

541. 8 

0.  09 

9 

3 

242.  8 

0.  24 

5 

4 

76.  3 

0.  03 

11 

5 

69. 4 

0.  0d 

9 

6 

47.  2 

0.  15 

8 

7 

30.  3 

0.  15 

8 

8 

17.6 

0.  04 

10 

9 

13.  3 

0.  17 

6 

10 

4.  7 

0.  16 

7 

11 

3.  8 

0.  24 

5 

12 

1.9 

0.  41 

3 

13 

0.  8 

0.  15 

8 

14 

0.4 

0.  25 

4 

15 

1.3 

0.  47 

2 

Overall  Average  = . 43 
(Normalizing  factor) 


Multi-Spectral  Images  (Non-Augmented) 

Rotated 

Normalized  Average 

Feature  No. 

Eigenvalue 

Bhattacharyya  Distance 

Rank 

1 

508.  3 

4.  88 

1 

2 

93.  6 

0.  45 

6 

3 

16.  2 

0.  02 

9 

4 

12.  3 

0.  02 

9 

5 

3.  1 

0.  58 

5 

6 

1.  3 

0.  38 

7 

7 

0.  9 

0.  23 

8 

8 

0.4 

0.  66 

4 

9 

0.  2 

0.  96 

3 

10 

0.  3 

1.  82 

2 

Overall  Average  = . 18 
(Normalizing  factor) 


Table  5.  2 (Continued)  Eigenvalues  vs.  Bhattacharyya  D istances 
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12  Feature  Set  - Full  Runs 
Operation 

CPU  Time  Required 
( Hours  : Minutes:  Seconds) 

Compute  Basic  Features 

1:  54 

3x3  Mode  Filters 

4:  02 

7x7  Mode  Filters 

9:  48 

15  x 15  Mode  Filters 

36:  02 

Total  Feature  Computation 

51:  46 

Feature  Rotation 

8:  09 

Initial  Clustering 

1:31:  32 

Final  Clustering 

21:  10 

Segmentation 

1:  13 

TOTAL  2;  53;  50 

4 Feature  Set  - Abbreviated  Run 
(T  ypical) 

Operation 

CPU  Time  Required 

( Minutes:  Seconds) 

3x3  Mode  Filter 
7x7  Mode  Filter 

1:  25 
3:  56 

Total  Feature  Computation 

5:  21 

Feature  Rotation 

1:  33 

Initial  Clustering 

6;  10 

Final  Clustering 

1:  59 

Segmentation 

1:  35 

TOTAL 

16:  38 

Table  6.3  Computer  Time 
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Chapter  6 

REAL  TIME  IMPLEMENTATION 
With  certain  minor  modifications,  the  segmentation  algorithm 
described  in  this  report  can  be  adapted  to  near  real  time  operation. 

In  the  sense  used  here,  near  real  time  implies  operation  at  standard 
TV  rates. 

b.  1 Feature  Computation 

Figure  6.  1 is  a block  diagram  of  a hypothetical  system.  The  fea- 
ture computer  computes  the  features  in  real  time  from  the  input  tele- 
vision  image.  The  technology  for  this  block  of  the  system  is  in  deve- 
lopment [6-  1 1 on  charge -coupled -device  (CCD)  hardware  and  may 
even  be  implemented  on  the  focal  plane  of  a multi- element  sensor. 

This  conceptualization  is  sometimes  called  the  "smart  sensor"  design. 

The  raw  features  are  then  forwarded  to  the  feature  rotator.  The 
feature  rotator  performs  a real  time  multidimensional  rotation  of  the 
input  vector,  that  is,  each  component  of  the  output  vector  is  a weight- 
ed sum  of  the  input  vector  components.  The  weights  are  a function  of 
the  picture  statistics,  specifically  the  picture  covariance  matrix 

which  is  computed  and  diagonalized  by  the  statistical  computer.  The  .' 

statistical  computer  may  consist  of  a combination  of  a microprocess- 
or and  other  hardware.  It  is  a reasonable  assumption  that  the  picture 
statistics  will  not  change  substantially  over  a small  number  of  framea 
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Figure  6.1  Real  Time  Segmentation  System 


The  statistical  computer  will  therefore  not  have  to  compute  and 


1 


diagonalize  the  covariance  matrix  in  a single  scan,  but  may  take 
several  scans  to  perform  this  computation. 

6.  2 Segmentor 

The  heart  of  the  system  is  the  segmentor.  This  device  accepts 
the  incoming  rotated  feature  vector  and  the  cluster  means  from  the 
mean  computer  and  assigns  each  incoming  vector  to  the  nearest  clus- 
ter mean.  The  output  of  the  segmentor  is  therefore  a scalar  corres- 
ponding to  the  index  number  of  the  cluster  to  which  the  vector  has 
been  assigned.  The  segmentor  must  accept  an  input  from  the  cluster 
data  computer  through  the  mean  computer  which  defines  those  fea- 
tures to  be  ignored  in  the  assignment  of  the  vectors.  The  ignoring 
of  features  in  the  vector  assignment  is  equivalent  to  feature  reduction 
in  the  algorithm  which  has  been  implemented  in  software  on  a digital 
compute  r. 

6.  3 Mean  Computer 

The  mean  computer  accepts  the  incoming  feature  vectors  and  the 
output  of  the  segmentor  and  recomputes  the  cluster  means  for  use  in 
segmenting  the  next  frame  of  picture  data.  The  effect  of  this  proce- 
dure is  that  the  current  frame  is  always  being  segmented  based  on  the 
means  computed  during  the  previous  frame.  The  assumption  is  made 
that  the  cluster  means  change  slowly  with  respect  to  the  frame  time 


(usually  1/60  second). 
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A functional  diagram  of  the  mean  computer  is  shown  in  Figure 
6.  2.  The  vector  switch  accepts  the  input  feature  vector  and  switches 
it  to  the  summing  register  specified  by  the  output  of  the  segmentor. 
The  classification  summing  register  also  records  the  number  of  vec- 
tors in  each  cluster,  for  division  of  the  vector  sums  at  the  comple- 
tion of  the  frame.  The  output  at  the  end  of  the  frame  consists  of  the 
vector  sums  in  the  vector  summing  registers  divided  by  the  number 
of  vectors  that  contributed  to  each  respective  sum.  These  new  cluster 
means  are  then  forwarded  to  the  segmentor  for  segmentation  of  the 
next  incoming  frame. 

6.  4 Cluster  Data  Computer 

The  purpose  of  the  cluster  data  computer  (Figure  6.  1)  is  three- 
fold. First,  it  decides  when  the  mean  compute r / segmentor  loop  has 
converged  for  a fixed  number  of  clusters.  Convergence  will  be  as- 
sumed to  have  occurred  when  the  previous  means  and  the  current 
means  differ  by  less  than  some  amount  in  an  appropriate  norm.  The 
second  function  of  the  cluster  data  computer  will  be  to  evaluate  the 
rotated  features  with  respect  to  usefulness  and  specify  to  the  segmen- 
tor those  features  to  be  ignored.  The  remaining  function  of  the  cluster 
data  computer  is  to  decide  on  the  number  of  clusters  in  the  data  and 
to  specify  to  the  mean  computer  how  many  clusters  are  present.  At 
this  point,  the  real  time  algorithm  deviates  from  the  computer  al- 
gorithm as  implemented  currently.  Implementation  of  the  current 
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algorithm  would  require  three  sets  of  segmentation  hardware,  that  is, 

3 segmentors  and  3 mean  computers.  One  of  these  sets  would  per- 
form clustering  for  the  assumed  number  of  clusters,  N.  The  other 
two  sets  would  perform  clustering  for  N-l  and  N+l  clusters,  respec- 
tively. The  number  of  clusters  would  be  incremented  or  decremented 
as  necessary  to  maintain  the  quality  parameter  (see  Chapter  4)  at 
maximum  for  N clusters. 

The  requirement  for  3 sets  of  segmentation  hardware  to  deter- 
mine the  intrinsic  number  of  clusters  in  the  data  is  cumbersome  and 
inefficient.  It  is  suggested  that  alternate  procedures  be  used  to  ac- 
complish the  same  result  with  one  set  of  segmentation  hardware.  A 
suggested  starting  point  would  be  for  the  algorithm  to  attempt  to  main- 
tain the  ratio  of  within  to  between  scatter  measures  at  some  fixed 
(possibly  operator  adjustable)  value.  Suitable  hysteresis  would  be 
necessary  to  prevent  a low  level  limit  cycle  about  the  fixed  value. 

The  cluster  data  computer  will  most  likely  require  more  than 
one  frame  to  compute  the  measures  necessary  to  set  the  number  of 
clusters.  In  addition,  it  must  wait  until  the  inner  loop  comprised  of 
the  segmentor  and  mean  computer  has  converged  in  order  to  begin 
computation  of  these  measures.  Further,  as  the  number  of  clusters 
is  incremented  or  decremented,  the  cluster  data  computer  must  de- 
cide which  cluster  to  eliminate  or  where  to  initialize  a new  cluster 


center.  The  suggested  procedure  is  to  combine  the  cluster  center 
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pair  having  minimum  separation  and  to  initialize  new  clusters  at  the 
vector  furthest  from  its  respective  cluster  center,  as  is  done  current- 
ly by  the  computer  algorithm.  An  alternate  procedure  would  be  to 
split  the  cluster  having  greatest  variance  into  two  clusters,  each  1 
standard  deviation  removed  from  the  previous  cluster  center.  It 
should  be  noted  that  an  additional  respect  in  which  the  real  time  pro- 
cedure suggested  here  differs  from  the  computer  procedure  described 
previously  is  that  the  real  time  procedure  uses  every  pixel  for  every 
calculation.  This  differs  from  the  computer  algorithm  in  which  a 
sample  procedure  is  used  to  reduce  computer  time  required. 

6.  5 Preliminary  Functional  Requirements 

A preliminary  estimate  of  the  accuracy  and  overall  register  size 
necessary  to  achieve  useful  results  for  this  system  is  given  in  Table 
6.  1.  These  values  are  derived  from  the  subjective  judgment  of  the 
author  only,  based  on  results  obtained  with  the  computer  algorithm. 
Minimal,  most  probable,  and  maximal  requirements  are  given  since 
the  cost  of  implementation  in  terms  of  hardware  may  be  very  non- 
linear and  an  intelligent  compromise  can  often  mean  large  hardware 
cost  savings. 

Most 

Parameter  Minimal  Probable  Maximal 


6.  6 Motion  Picture  Segmentation  Results 

The  algorithm  described  in  Chapter  4 was  used  to  segment  two 
frames  of  a motion  picture  of  a chemical  plant.  The  results  of  these 
segmentations  are  shown  in  Figure  6.  3,  along  with  the  original  photo- 
graphs. The  motion  picture  was  taken  from  a moving  aircraft,  and 
the  originals  are  not  spatially  registered,  as  can  be  seen.  They  are 
five  frames  apart  in  the  motion  picture. 

The  two  segmentations,  however,  appear  quite  similar,  and 
support  the  hypothesis  that  the  statistical  structure  of  the  data  can 
be  identified  for  the  purposes  of  segmentation  even  when  the  pictures 
are  not  spatially  registered.  The  mean  vectors  for  the  12  feature 
sets  are  shown  in  Table  6.  2.  The  first  mean,  which  is  the  mean  of 
the  original  image,  differs  between  the  two  images  by  about  12%. 

This  is  presumed  to  be  caused  by  frame  to  frame  exposure /develop- 
ment differences  between  the  two  frames.  The  difference  in  means 
causes  a corresponding  difference  in  variances,  and  the  rotation 
matrices  for  the  two  feature  sets  are  sufficiently  different  to  prevent 
ideal  tracking  of  the  cluster  means,  since  they  are  effectively  repre- 
sented with  respect  to  different  bases. 

If  a real  time  system  is  'mplemented,  and  frame  to  frame  amp- 
litude differences  are  expected,  either  appropriate  scaling  will  be 
required  or  the  rotation  matrix  will  have  tobe  forced  to  change  slow- 
ly. The  effect  of  this  procedure  would  be  to  rotate  image  feature  sets 


Original  - Frame  1 


Original  - Frame  5 


Segmentation  - Frame  1 
(4  Clusters) 


Segmentation  Frame  5 
(4  Clusters) 


Figure  6.  3,  Motion  Picture  Results 


Feature  Mean 

Frame  1 

F rame 

1 

88 

100 

2 

i99 

198 

3 

94 

96 

4 

83 

97 

5 

166 

166 

6 

94 

97 

7 

84 

90 

8 

137 

132 

9 

93 

93 

10 

79 

87 

11 

134 

128 

12 

94 

88 

Table  6.  2 Motion  Picture  Means 
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with  a non-optimal  rotation  matrix.  Since  the  rotation  is  performed 


to  permit  feature  rejection  in  decot  related  space,  the  penalty  for 
this  procedure  will  most  likely  be  small. 


Chapter  7 


CONCLUSIONS 

This  disse rtation  has  presented  a procedure  for  gross  segmenta- 
tion of  digital  imagery.  The  procedure  uses  an  unsupervised  method, 
and  requires  no  human  interaction  or  adjustable  thresholds.  There 
are  disadvantages  to  using  an  unsupervised  approach.  So  little  is 
known  about  the  human  perceptual  system  that  the  resulting  segmenta- 
tions will  usually  not  be  as  satisfying  as  segmentations  made  by  a 
human  being  or  those  performed  by  a carefully  trained  segmentor 
operating  in  a supervised  mode.  Additionally,  the  segmentor  has  no 
knowledge  of  the  intent  of  the  segmentation  except  that  provided 
implicitly  through  the  features  selected  to  be  used. 

There  are,  however,  advantages  to  the  unsupervised  approach. 
The  construction  of  a set  data  to  use  during  the  training  phase  of  the 
supervised  approach  is  time  consuming  and  tedious.  Additionally, 
the  supervised  method  is  incapable  of  satisfactory  performance  in 
situations  where  the  statistics  of  the  scene  vary  substantially.  Situa- 
tions that  are  likely  to  encounter  such  statistics  are  those  in  which 
the  sensor  characteristics  vary  and  those  in  which  near  real  time 
segmentation  of  real  images  is  desired.  The  difference  in  appearance 
with  weather,  time  of  day  and  terrain  makes  an  unsupervised  proce- 


dure mandatory. 
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The  procedure  outlined  herein  lends  itself  to  near  real-time 
implementation.  While  the  design  of  such  a system  to  operate  at 
television  rates  will  require  considerable  ingenuity  on  the  part  of  the 
circuit  designers,  it  is  felt  that  such  a system  is  well  within  the  state 
of  the  art  at  this  writing.  Such  a system  should  find  wide  application 
in  target  recognition/ tracking  systems  and  possibly  may  be  used  to 
solve  the  problem  of  cross-correlation  of  the  same  scene  observed 
by  sensors  of  radically  different  characteristics.  With  some  general- 
ization of  the  concept  of  cross-correlation,  segmentations  of  the  same 
scene  viewed  by  different  sensors  can  be  compared. 

The  unsuper  ised  approach  may  also  reveal  characteristics  in 
the  data  image  tha1  were  unobserved  by  the  human  observer.  There 
may  exist  inherent  clusters  in  the  data  that  passed  unnoticed  by  hu- 
man beings,  t se  of  a supervised  procedure  will  tend  to  further  mask 
these  unobserved  chara*  teristics,  as  the  training  of  the  classifier 
effectively  instructs  the  classifier  to  ignore  these  characteristics. 

The  unsupervised  approach  may  eventually  find  usefulness  in  image 
enhancement  because  of  the  ability  to  detect  unnoticed  structure  in 
the  data. 

A comparison  measure  was  introduced  and  utilized  to  compare 
different  segmentations  of  the  same  scene.  This  comparison  measure 
would  be  particularly  useful  in  comparing  segmentations  of  an  image 
performed  by  a candidate  procedure  with  a standard  segmentation  of 


r 
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the  same  image.  In  addition,  the  comparison  measure  may  be  the 
basis  from  which  to  proceed  on  defining  a generalized  cross-correla- 
tion function  for  use  in  cross-correlating  different  sensor  outputs. 

Further  work  is  certainly  necessary  in  understanding  the  human 
perceptual  system  at  its  intermediate  level  and  using  this  knowledge 
to  develop  features  to  improve  the  performance  of  the  segmentor. 

It  may  well  develop  that  some  textural  recognition  processes  occur 
at  a fairly  high  level  in  the  human  perceptual  system  and  do  not  lend 
themselves  to  implementation  in  the  lower  levels  of  an  image  under- 
standing system.  If  so,  the  improved  understanding  of  the  human 
perceptual  system  will  prove  valuable  as  much  for  what  it  indicates 
cannot  be  done  as  it  is  for  its  indications  of  what  can  be  done. 

The  clarification  of  what  is  meant  precisely  by  a "segmented 
image"  is  also  an  avenue  for  further  investigation.  If  a "well-seg- 
mented image"  can  be  represented  by  a mathematical  criterion,  then 
analysis  based  on  picture  statistics  will  almost  certainly  provide 
suggestions  on  how  to  improve  segmentor  performance.  In  addition, 
it  will  provide  means  for  predicting  hypothetical  system  performance 
without  having  to  build  and  test  the  system. 

Much  of  the  usefulness  of  an  image  segmentation  system  must  be 
determined  by  application.  The  current  state  of  the  art  in  image 
understanding  systems  is  such  that  applications  are  just  now  being 

postulated,  much  less  implemented  and  tested.  The  advantages  of  the 
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procedure  described  herein  seem  to  be  twofold. 

First,  the  procedure  provides  the  cluster  means  directly  as  a 
by-product  of  the  segmentation  process.  This  is  opposed  to  the  pre- 
vious procedures,  which  segment  the  scene  with  boundary  detection 
methods,  compute  features  inside  the  boundaries,  and  only  then  per- 
form clustering  to  determine  the  means. 

A second  advantage  of  this  procedure  is  its  potential  for  real 
time  implementation.  Many  previous  procedures  have  required  exact 
spatial  stationarity  of  the  image  data  to  permit  the  iterations  necess- 
ary to  perform  segmentation.  This  procedure  requires  only  that  the 
picture  statistics  change  slowly  with  time,  and  does  not  require 
storing  the  entire  image  at  one  time.  Such  a procedure  will  have 
clear  advantages  when  the  sensor  is  mounted  on  a moving  platform 
as  in  target  detection/ recognition  systems. 
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